Showing posts with label sample size. Show all posts
Showing posts with label sample size. Show all posts

Tuesday, April 9, 2013

Top 2 Box versus Top 3 Box – Why We Like Top 2 Better

Our company has done thousands of surveys, both domestically and for international clients. Many of our closed-in questions use 5-point scales, such as:

  • Satisfaction scales – Extremely Satisfied, Very Satisfied, Somewhat Satisfied, Not Very Satisfied, Not At All Satisfied
  • Agreement scales – Strong Agree, Agree, Neither Agree Nor Disagree, Disagree, Strongly Disagree
  • Judgment scales – Excellent, Very Good, Good, Fair, Poor

and others along similar lines.

Many of our analyses include comparing our clients’ scores on these 5-point scales to relevant normative data measured using the same scales. While we always show the full percentage distribution for our client’s scores, we typically show just a single number for the comparative norm and in our case it’s usually the Top 2 Box score. So for using our satisfaction scale as an example, that score is the combined percentage of Extremely Satisfied and Very Satisfied responses, with the other scales following the same pattern.

In recent weeks a couple of our clients have asked why we use the Top 2 Box score rather than the Top 3. They reason that the mid-point on most of our scales is basically positive. For example, they see a scale point of Somewhat Satisfied as more positive than negative and feel that a Top 3 Box score would more fairly summarize how much of the population they are surveying feels positively toward the question being asked.

There are two key reasons why we generally use Top 2 instead of Top 3, one that has to do with our experience as in data collection and another that is tied philosophically and practically to our desire to help our clients make meaningful use of their research findings.

From a research perspective, we know that many of the things we measure tend to have a positive bias. The vast majority of people are get some satisfaction from their jobs and the products/services they use (otherwise they’d leave or buy something else!), more people agree than disagree with almost any statement or description (as long as that isn’t about a highly controversial/political topic) and most people have to have a pretty lousy experience with something before they are willing to label it fair or poor. Consequently, the distribution of opinions on most 5-point scales is heavily toward the positive end of the scale. The mid-point of the scale is not the mid-point of the distribution of answers. In this sense, using the Top 2 score instead of Top 3 is similar to what a teacher does when grading on a curve – a mid-point score (3 on a five-point scale) is not the equivalent of a “C” grade, it’s more like a “D” or “D plus”. We don’t consider a “D” to be a good grade and don’t want to tell our clients they are doing great when their scores are mediocre (or worse). The Top 2 Box score generally comes closer to identifying where the “good” part of the distribution lies than the Top 3 Box score.

Even more important is the philosophical issue of what it means to have a “good” score on a survey item. Our clients use their data to make decisions in order to improve their workplaces, products and services. So it is our job to tell them, as accurately as possible, what they are doing well and where they need to improve. That means that when we categorize a score as good or set a benchmark number that a client should shoot for, it should represent a real achievement, not just scraping by. Products are not successfully launched if consumers only “kind of” like them. Workplaces are not happy and productive if they are filled with “somewhat satisfied” employees. And we’d be out of business if our clients thought we were merely “good” at what we do. We aim for the Top 2 Box and urge our clients to so so as well.

Tuesday, April 17, 2012

The Risks of Projecting Survey Results To A Larger Population

In my experience, most quantitative research results are analyzed on the basis of the survey results themselves – such as the percentage distributions on rating scales – without the need to project results onto the larger population that the sample represents. It is generally understood that, with reasonably rigorous sampling procedures, these distributions are reflective of the attitudes held by the population at large.

In some instances, though, it is important to project to the larger group, such as when creating estimates of product use based on concept results. In these cases, we face a special challenge – do we take consumers at their word and simply extrapolate their answers to the larger population or do we use some combination of common sense and experience to adjust the data?

Although there are many sophisticated models for translating interest in a new product or service into projections of first year use, most include “adjustments” to the survey data to account for typical consumer behavior, such as:

1. The typical 5-point purchase intent scale is weighted in order to more accurately predict what proportion of the population will actually try the product. For example, the proportion of those who would “definitely buy” might be given a weight of 80% to reflect a high, but not absolute, likelihood of buying whereas those who would “probably buy” might be given a weight of just 20%.

2. Secondly, these results assume 100% awareness of the new product or service so further adjustments are required to account for the anticipated build in awareness, usually as a result of advertising, and

3. Thirdly, some estimate of repeat purchase is required, often derived from consumer experience with the new product or service or from established market results.

We take these steps to mitigate the risk of simply applying the survey results to the total population, as this could wildly inflate potential use of a new product or service.

This issue came to my mind this weekend when reading a New York Times article called “The Cybercrime Wave That Wasn’t” (http://www.nytimes.com/2012/04/15/opinion/sunday/the-cybercrime-wave-that-wasnt.html) in which Dinei Florêncio and Cormac Herley of Microsoft Research conclude that, although some cybercriminals may do well, “cybercrime is a relentless, low-profit struggle for the majority.”

Part of their analysis questions the highly-touted estimates of the value of cybercrime, including a recent claim of annual losses among consumers at $114 billion worldwide. This estimate makes the value of such crime comparable to estimates of the global drug trade. As it turns out, however, Florêncio and Herley conclude that “such widely circulated cybercrime estimates are generated using absurdly bad statistical methods, making them wholly unreliable.” This is a very practical example of how results from what appear to be reasonably large research samples can run into critical problems of statistical reliability, whether through poor sampling, naïve extrapolation or other sorts of statistical errors. In the case of the cybercrime estimate, it appears that the estimates of losses that come from just 1 or 2 people in the research sample are being extrapolated to the entire population, which means that

In this particular example, a more accurate approach would be to separate the “screening” sample – i.e., identifying those consumers who have been victims of cybercrime using an extremely large database – from the “outcome” sample. In other words, if the goal is to estimate the impact of cybercrime, the objective should be to find a reliable sample of victims and interview them on their experience, including the extent of their losses. This approach would provide a much more rigorous basis for estimating the total value of cybercrime. However, caution should still be exercised when projecting to the total population.

The key learning is that anytime we have data we want to extrapolate, we need to think about how much we trust that data to be accurate. There are some things consumers can report with superb accuracy - where they ate lunch today, the size of their mortgage payment, how many pets are in their homes. Assuming a decent survey sample, data of this sort can be easily extrapolated to a larger population. But other kinds of data are less accurate, whether due to the limits of human recall or various other forms of bias. Studies have shown, for example, that survey respondents cannot accurately recall where they ate lunch a week or two ago (recall error), tend to under-report their alcohol consumption (social desirability bias) and over-estimate their future purchases of products we show them in concept tests.

So, if we wish to extrapolate from our survey data to a larger sample, we have to be honest about how accurate the results are, what sorts of bias might inflate or deflate the numbers, and what sorts of adjustments, if any, we should make. And when we see stories in the media with giant estimates of the prevalence of some sort of crime, social problem or behavioral trend, we need to take a moment to ask how they came up with those numbers. Often, with a little digging, we see problems in how these estimates were created, leading to the same need for logic and common sense that we find when dealing with our own market projections.

Monday, April 9, 2012

How and When Should I Use Statistical Testing?

Statistical testing is a common deliverable provided by market research vendors. But in some cases the users of the research findings may be uncertain about what the statistical testing really means and whether or not it should influence the way they use the data. Below are five key questions to keep in mind when using statistical testing.
1. What kind of data am I dealing with? Statistical testing can only be applied to quantitative data, such as survey data. There are no statistical tests for qualitative data, such as focus groups and in-depth interviews.
2. What am I trying to learn? Most statistical testing is used primarily to help decide which of the differences we see in our data are real in terms of the population we are interested in. For example, if your findings show that 45% of men like a new product concept and 55% of women like the concept, you need to decide if that difference is real ‒that is, the difference seen in your survey accurately reflects a difference between men and women that exists in the larger population of target consumers.
3. How certain do I need to be? Confidence intervals are the most common way of deciding whether percentage differences of this sort are meaningful. The size of a confidence interval is determined by the level of certainty we demand – usually 90% or 95% in market research, 95% or 99% in medical research – and the size of our sample relative to the population it is drawn from. The higher the level of certainty we demand, the wider the confidence interval will be – with a very high standard of certainty, we need a wide interval to be sure we have captured the true population percentage. Conversely, the bigger the sample, the narrower the confidence interval - as the sample gets bigger it becomes more and more like the target population and we become more certain that the differences we see are valid.
4. How good is my sample? Most statistical tests rely on key assumptions about how you selected the sample of people from whom you collected your data. For tests like the confidence intervals described above, this key assumption is having some element of random selection built into your sample that makes it mathematically representative of the population you are studying. The further your sampling procedure strays from this assumption, the less valid your statistical testing will be. If you can make the case that your sample is not biased in any important ways relevant to your research questions, you can rely on your stats tests to identify meaningful differences. If you have doubts about your sample, use the tests with caution.
5. Does my data meet other key assumptions about the test? Some stats tests assume particular data distributions, such as the bell-shaped curve which is an underlying assumption for confidence intervals. If your data are distributed in some other way – lop-sided toward the high or low end of the scale or polarized – the stats test is worse than worthless, it will actually be misleading!
6. Does the stats testing seem to align with other things I know about the research topic? Stats tests should supplement your overall understanding of the data. They are not a substitute for common sense. Keep in mind that most data analysis software will produce stats tests automatically, whether or not the tests are appropriate for the particular data set you are using. Almost every experienced researcher has watched someone (or been someone) trying to explain a “finding” that was nothing more than a meaningless software output.
If you can provide honest, satisfactory answers to these five questions, stats testing can hugely improve your understanding of your data and help you identify its key themes. And likewise, these key questions can keep you from wasting your time analyzing differences that aren’t really there.

Tuesday, January 24, 2012

How Many People Do I Need To Survey To Get Meaningful Answers?

A central decision for anyone considering a survey – or any other quantitative research – is figuring out how big the survey sample needs to be in order to produce meaningful answers to the research questions. Researchers focus on sample size because it ties together three core aspects of any research effort:

  • Cost – the bigger the sample, the more it will cost to collect, process and analyze the data
  • Speed – the bigger the sample, the longer it will take to collect it (big samples can sometimes be collected quickly, but usually only by further raising costs!)
  • Accuracy – the bigger the sample, the more certain we can be that we have correctly captured the perceptions/opinions/behavior/beliefs/feelings of the population we are interested in (the technical term for this is statistical reliability)

As we see from these three bullets, the decision about sample size essentially boils down to a trade-off between cost, speed and accuracy. So when we pick a sample size we are making a decision about how much accuracy we are going to purchase, within the framework of our budget and timing.

Fortunately for researchers, quantitative samples do not have to be enormous to provide findings that are accurate enough to answer most market research questions. Any unbiased sample (we’ll talk about sample bias in another blog entry) of 50 or more stands a halfway decent chance of giving you a reasonable view of the population it is drawn from and, as we increase the sample size, our confidence that we have the correct answer increases. We can show this effect by looking at the margin of error – the plus or minus number – for some common sample sizes. To keep it simple, all of these are calculated using the assumption that the sample is drawn from a large population (20,000 or more) and that we are using the 95% confidence level of statistical reliability (the most typical standard for statistical reliability used in market research). If we are looking at percentages:

  • A sample of 100 has a margin of error of ± 9.8%
  • A sample of 250 has a margin of error of ± 6.2%
  • A sample of 500 has a margin of error of ± 4.4%
  • A sample of 1,000 has a margin of error of ± 3.0%
  • A sample of 2,000 has a margin of error of ± 2.1%

Looking at these numbers you can see why national surveys, such as the big public opinion polls shown on TV or in newspapers, often have samples around 1,000 or so. Samples in that size range have small margins of error, and doubling the sample size wouldn’t make the margin of error much smaller – there’s no reason to spend money making the sample bigger for such a small gain in accuracy.

These numbers also show why we often urge clients to spend a bit to make a small sample bigger, but not too big! The gains in accuracy are all at the beginning – moving from a sample of 100 to something larger is almost always a good idea, while adding anything over 1,000 usually is not. So the rule of thumb is: 100 is probably too small and 1,000 is probably too big.

Of course, in real life it can be more complicated. We may need to examine sub-groups (age or income groups, political parties, geographic regions, etc.) within the population we are looking at. If a sub-group is small, we may need a bigger overall sample to capture enough of each of the sub-groups in order to provide an accurate picture of their views. So we have a rule of thumb about sub-groups, as well – don’t make decisions about any sub-group smaller than 30. For example, if we do a survey of households in a large urban area and we want to compare households by income level, we need to make our sample big enough to have at least 30 households in each of the income categories we want to compare. Assuming this is a normal city, there will be fewer households at the high end of the income distribution than at the low end, so we need to think about how to get enough of the high-end households to be able to make how comparison. So, if we want to be able to look at households with income over $100K, and 15% of the population has an income of $100K more, we need to have a sample of at least 200 households to ensure that 30 of the households would be in that category.

Using these rules of thumb, you can form an idea about how big your sample needs to be to answer your research questions - without spending more than you can afford.

Friday, January 13, 2012

The “Dirty Dozen” – The Most Common Things That Cause Market Research To Go Wrong

Clients who are new to market research often worry that they will spend time and money on a market research project only to later realize that errors in the design or execution of the study have rendered their investment much less valuable than they had hoped. And they are right to worry – there are plenty of cases of market research blunders and even examples where market research findings have led to product design or marketing decisions that were worse than if there had been no research at all!
There are lots of individual events or mistakes that can knock market research projects off track and a short blog entry could never list them all. But there are a few types of errors that account for most of the problems. We think of these as the “dirty dozen,” – the common mistakes that we see over and over again and that rob market research studies of their potential value to decision makers. The tables below list these problems by the stage of the research project where they typically occur, along with the consequences the problem brings and – most importantly – ways to avoid them.
At the research design stage:
Problem Consequences Solution
Poorly formulated objectives or research questions The data collected will not address the real issues, resulting in findings that are vague, inconclusive or even misleading. Write out your objectives and research questions and think about what kind of data would count as an answer to each one. Don’t skip this step or assume that everybody on the project has a shared understanding of what the research is supposed to accomplish.
Poor choice of data collection method(s) Lack of insight when needed depth or breadth (or sometimes both!) is missing from the collected data. Make the method(s) suit the objectives and research questions. Don’t get locked into “standard” approaches or doing what is easy rather than what is best for the study.
Sample design issues Asking questions of the wrong people can produce misleading answers in qualitative studies or statistically invalidate a quantitative project. Be explicit about the sample parameters. Know who you are going to talk to and exactly what larger population the sample is supposed to represent.
Poorly designed/untested research instrument Garbage in – garbage out. Failing to ask well-though-out questions, whether in a survey, a focus group or an in-depth interview, will produce poor quality results. Every question you ask should have a purpose that relates back to the research objectives. Don’t neglect review and testing of the questionnaire or interview guide.
At the data collection stage:
Problem Consequences Solution
Inadequately trained or prepared data collectors Data that is inconsistently gathered can produce gaps, validity problems and lack of depth. Use professionals who know their jobs and have proven track records. Even experienced survey data collectors, interviewers and focus group moderators need to practice with the research instrument.
Failure to meet the sample specifications If the sample you get is not the sample you intended, you may have data that is not pertinent or that misrepresents the views of the target population. Have good quality control on the sample. If adjustments have to be made, be very sure you are not giving up the validity of your sample in order to fill your groups or meet numerical quotas.
Quality control issues Poor quality control can result in errors in the data files, data that is missing or is mis- categorized. Have a plan to check incoming data as it is collected. Don’t wait until data collection is over to begin the process of checking for errors or problems.
Loss of data You can’t analyze data that has disappeared. Have back-ups (and more-back-ups). Never let data reside longer than necessary in a single location or file. Have security and back-up procedures for all data storage media.
At the data analysis/interpretation stage:
Problem Consequences Solution
Lazy/incomplete review of the raw data Important insights can be missed. Have a data analysis plan that sets out how the raw data will be handled. Allow enough time for data review and processing. Don’t rely on human memory or quick skimming to capture all the meaning that the data holds.
Inappropriate data reduction techniques Each time data is reduced, whether through coding of qualitative data or numerical consolidation of quantitative data, there is some potential loss of information or important details. Make sure your chosen data reduction techniques capture the themes, ideas and categories that will answer the research questions. Don’t be afraid to recode or re-analyze if new issues emerge while data reduction is in progress. Remember recoding means you have learned something from your data – it’s a step forward not a step back!
Over-reliance on statistics Interpretation that is guided only by statistical testing runs the risk of missing insights that didn’t quite pass the test criteria or finding “insights” that are really just an artifact of the statistical method. Know the strengths and weaknesses of the statistics you use. Use them as guidelines and tools, not as the word of the research gods. Statistics are not a substitute for common sense of familiarity with your data and your research topic.
Canned answers Having a bias toward a particular answer or type of interpretation can blind you to new themes and ideas that emerge from your data. Keep an open mind. Let the data speak to you. Think about what would count as disproof of your preferred interpretation and make sure that there’s a way for that evidence to emerge.
As you can see, there are many potential pitfalls and room for error in market research projects of any type. Careful planning, design, oversight and analysis are absolutely key to getting the best value from the money you spend on market research!