It’s about a 8 min. read.
Dear Dr. Jay,
We want to assess the importance of fixing some of our customer touchpoints, what would you recommend as a modeling tool?
-Alicia
Hi Alicia,
There are a variety of tools we use to determine the relative importance of key variables on an outcome (dependent variable). Here’s the first question we need to address: are we trying to predict the actual value of the dependent variable or just assess the importance of any given independent variable in the equation? Most of the time, the goal is the latter.
Once we know the primary objective, there are three key criteria we need to address. The first is the amount of multicollinearity in our data. The more independent variables we have, the bigger problem this presents. The second is the stability in the model over time. In tracking studies, we want to believe that the differences between waves are due to actual differences in the market and not artifacts of the algorithm used to compute the importance scores. Finally, we need to understand the impact of sample size on the models.
How big a sample do you need? Typically, in consumer research, we see results stabilize with n=200. Some tools will do a better job with smaller samples than others. You should also consider the number of parameters you are trying to model. A grad school rule of thumb is that you need 4 observations for each parameter in the model, so if you have 25 independent variables, you’d need at least 100 respondents in your sample.
There are several tools to consider using to estimate relative importance: Bivariate Correlations, OLS, Shapley Value Regression (or Kruskal’s Relative Importance), TreeNet, and Bayesian Networks are all options. All of these tools will let you understand the relative importance of the independent variables in predicting your key measure. One think to note is that none of the tools specifically model causation. You would need some sort of experimental design to address that issue. Let’s break down the advantages and disadvantages of each.
These are a couple of approaches that consider all possible combinations of explanatory variables. Unlike traditional regression tools, these techniques are not used for forecasting. In OLS, we predict the change in overall satisfaction for any given change in the independent variables. These tools are used to determine how much better the model is if we include any specific independent variable versus models that do not include that measure. The conclusions we draw from these models refer to the usefulness of including any measure in the model and not its specific impact on improving measures like overall satisfaction.
Got a burning research question? You can send your questions to DearDrJay@cmbinfo.com or submit them anonymously here.
Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. Jay earned his Ph.D. in Marketing/Research from the University of Texas at Arlington and regularly publishes and presents on topics, including conjoint, choice, and pricing.