One of the most common needs of market research clients is the explanation and forecast of consumer behavior. When it comes to quantitative studies, correlation and regression analyzes are an effective method to achieve these types of findings as long as you have access to the necessary inputs.
For this, it is necessary to bear in mind some points without which the application of this method will not yield the expected and statistically valid elements in order to make applicable and profitable decisions. Here is a list of the most common things that should always be kept in mind when running correlation analysis.
- The main input is a broad bivariate database. This list must be obtained by the most appropriate mechanisms and must represent all the elements of the market under study. Otherwise the results obtained may be different from reality. It is essential that the databases to be used contain the data of all the valid records of a universe or sample. A commonly used input for this type of study is the results of censuses or administrative statistics from public and private organizations.
- The dot plot should show a clear trend. When plotted on a Cartesian plane, the data listed above should show a clear linear configuration, either arithmetic or logarithmic. Otherwise we would be facing a set of data pairs without a relationship with each other. However, care must be taken when making this graph because there are situations in which false correlations can be observed, such as the Simpson paradox.
- Multivariate correlations are more common than bivariate ones. Human behavior is almost never influenced by a single circumstance. It is common that when applying the correlation analysis, more than one independent variable is discovered with strong weight in the explanation of the dependent variable. Nor is it strange that other variables not considered in obtaining the results are underestimated or ignored.
- Predictions with regression equations are not deterministic. Unlike the exercises in algebra where the value of the independent variables affect yes or yes in a certain proportion to the dependent variable, in market studies it is chosen to approach the regression analyzes in a probabilistic way.
- It is valid to disregard extreme data. When reviewing the data of a database that will be treated with the correlation method, it is valid to ignore extreme records as long as their number is small. Otherwise the findings of the analysis could be strongly biased. If the number of data in this situation is greater than what experience indicates, it will be necessary to run imputation methods, check the original database again or, if possible, repeat the data collection.
At Acertiva we have been carrying out for 18 years, among other activities, quantitative market studies in Latin America in which we support our clients with their design and execution. Get in touch with us to share with you how we can add our experience to solve your data and information gathering challenges.