Tuesday, April 9, 2013

Top 2 Box versus Top 3 Box – Why We Like Top 2 Better

Our company has done thousands of surveys, both domestically and for international clients. Many of our closed-in questions use 5-point scales, such as:

  • Satisfaction scales – Extremely Satisfied, Very Satisfied, Somewhat Satisfied, Not Very Satisfied, Not At All Satisfied
  • Agreement scales – Strong Agree, Agree, Neither Agree Nor Disagree, Disagree, Strongly Disagree
  • Judgment scales – Excellent, Very Good, Good, Fair, Poor

and others along similar lines.

Many of our analyses include comparing our clients’ scores on these 5-point scales to relevant normative data measured using the same scales. While we always show the full percentage distribution for our client’s scores, we typically show just a single number for the comparative norm and in our case it’s usually the Top 2 Box score. So for using our satisfaction scale as an example, that score is the combined percentage of Extremely Satisfied and Very Satisfied responses, with the other scales following the same pattern.

In recent weeks a couple of our clients have asked why we use the Top 2 Box score rather than the Top 3. They reason that the mid-point on most of our scales is basically positive. For example, they see a scale point of Somewhat Satisfied as more positive than negative and feel that a Top 3 Box score would more fairly summarize how much of the population they are surveying feels positively toward the question being asked.

There are two key reasons why we generally use Top 2 instead of Top 3, one that has to do with our experience as in data collection and another that is tied philosophically and practically to our desire to help our clients make meaningful use of their research findings.

From a research perspective, we know that many of the things we measure tend to have a positive bias. The vast majority of people are get some satisfaction from their jobs and the products/services they use (otherwise they’d leave or buy something else!), more people agree than disagree with almost any statement or description (as long as that isn’t about a highly controversial/political topic) and most people have to have a pretty lousy experience with something before they are willing to label it fair or poor. Consequently, the distribution of opinions on most 5-point scales is heavily toward the positive end of the scale. The mid-point of the scale is not the mid-point of the distribution of answers. In this sense, using the Top 2 score instead of Top 3 is similar to what a teacher does when grading on a curve – a mid-point score (3 on a five-point scale) is not the equivalent of a “C” grade, it’s more like a “D” or “D plus”. We don’t consider a “D” to be a good grade and don’t want to tell our clients they are doing great when their scores are mediocre (or worse). The Top 2 Box score generally comes closer to identifying where the “good” part of the distribution lies than the Top 3 Box score.

Even more important is the philosophical issue of what it means to have a “good” score on a survey item. Our clients use their data to make decisions in order to improve their workplaces, products and services. So it is our job to tell them, as accurately as possible, what they are doing well and where they need to improve. That means that when we categorize a score as good or set a benchmark number that a client should shoot for, it should represent a real achievement, not just scraping by. Products are not successfully launched if consumers only “kind of” like them. Workplaces are not happy and productive if they are filled with “somewhat satisfied” employees. And we’d be out of business if our clients thought we were merely “good” at what we do. We aim for the Top 2 Box and urge our clients to so so as well.

Which Way to Compare: Part 2 – Why Percentage Distributions Are Also Better Than Indexing

Some of our clients are used to seeing comparative data presented in the form of an index – a standardized score set to 100. Their own score for the indexed item is then shown relative to the index. For example, if the national norm on a survey item is a score of 60, then 60 is indexed to 100. If the client’s score on that item is s72, that is 20% higher than 60, producing a relative score of 120.

With an index, data users can see at a glance how they stand relative to the comparative norm. This way of showing a comparison is conceptually easy to grasp and is common in types of research where the focus is heavily on the relative strength of whatever is being measured in comparison to some standard. Certain types of advertising research, in particular, uses indexing. When a proposed new ad is being tested, for example, the emphasis is not just on how well the ad does with its intended audience, but also on how well it does relative to other ads. A successful ad has to break through media clutter and there is no benefit in producing an ad and making an expensive media buy only to have the ad fail to stand out against the noise of other ads.

The strength of using an index, however, is also its weakness. By its very nature, an index puts the analytic focus on the comparison, not on whether the actual score is high or low, good, bad or indifferent. An example can help make this point. Imagine that we are designing a dessert menu for a restaurant and they survey their patrons and ask “Do you like chocolate flavored desserts?” We find that against the industry norm, the restaurant’s patrons index below the norm - only 90 against the index of 100. We might be tempted to limit the presence of chocolate in the dessert menu in favor of other flavors. An examination of the actual percentages, however, is more revealing. The norm shows that 80% of the population like chocolate desserts, while the restaurant patrons’ score was 72%. So despite the low index score, this is still a substantial majority of the folks dining at the restaurant and we would certainly want to make sure there are some chocolate goodies on that dessert menu.

As in the example above, using an index can be misleading. This is especially true if the index represents a very high percentage (like the example above) or a very low percentage. A client who “beats” an index based on a low score may feel proud, even though there is little to brag about, and may feel no changes or improvements are needed even when there is plenty of room to do so. Similarly, failing to match an index based on a very high percentage can cause clients to be upset or start wasting time and resources fixing a problem that isn’t really much of a problem at all.

So while we do feel that indexing has its place – wherever comparison is the whole point of the exercise – we think that it’s generally better to know a little bit more of the details. Most research is about making decisions and setting priorities that are about much more than whether or not you have outscored an index. A percentage distribution compared to a well-constructed norm lets data users really see where they stand, set goals that are achievable and meaningful, and take pride in legitimate successes.