Tuesday, April 9, 2013

Top 2 Box versus Top 3 Box – Why We Like Top 2 Better

Our company has done thousands of surveys, both domestically and for international clients. Many of our closed-in questions use 5-point scales, such as:

  • Satisfaction scales – Extremely Satisfied, Very Satisfied, Somewhat Satisfied, Not Very Satisfied, Not At All Satisfied
  • Agreement scales – Strong Agree, Agree, Neither Agree Nor Disagree, Disagree, Strongly Disagree
  • Judgment scales – Excellent, Very Good, Good, Fair, Poor

and others along similar lines.

Many of our analyses include comparing our clients’ scores on these 5-point scales to relevant normative data measured using the same scales. While we always show the full percentage distribution for our client’s scores, we typically show just a single number for the comparative norm and in our case it’s usually the Top 2 Box score. So for using our satisfaction scale as an example, that score is the combined percentage of Extremely Satisfied and Very Satisfied responses, with the other scales following the same pattern.

In recent weeks a couple of our clients have asked why we use the Top 2 Box score rather than the Top 3. They reason that the mid-point on most of our scales is basically positive. For example, they see a scale point of Somewhat Satisfied as more positive than negative and feel that a Top 3 Box score would more fairly summarize how much of the population they are surveying feels positively toward the question being asked.

There are two key reasons why we generally use Top 2 instead of Top 3, one that has to do with our experience as in data collection and another that is tied philosophically and practically to our desire to help our clients make meaningful use of their research findings.

From a research perspective, we know that many of the things we measure tend to have a positive bias. The vast majority of people are get some satisfaction from their jobs and the products/services they use (otherwise they’d leave or buy something else!), more people agree than disagree with almost any statement or description (as long as that isn’t about a highly controversial/political topic) and most people have to have a pretty lousy experience with something before they are willing to label it fair or poor. Consequently, the distribution of opinions on most 5-point scales is heavily toward the positive end of the scale. The mid-point of the scale is not the mid-point of the distribution of answers. In this sense, using the Top 2 score instead of Top 3 is similar to what a teacher does when grading on a curve – a mid-point score (3 on a five-point scale) is not the equivalent of a “C” grade, it’s more like a “D” or “D plus”. We don’t consider a “D” to be a good grade and don’t want to tell our clients they are doing great when their scores are mediocre (or worse). The Top 2 Box score generally comes closer to identifying where the “good” part of the distribution lies than the Top 3 Box score.

Even more important is the philosophical issue of what it means to have a “good” score on a survey item. Our clients use their data to make decisions in order to improve their workplaces, products and services. So it is our job to tell them, as accurately as possible, what they are doing well and where they need to improve. That means that when we categorize a score as good or set a benchmark number that a client should shoot for, it should represent a real achievement, not just scraping by. Products are not successfully launched if consumers only “kind of” like them. Workplaces are not happy and productive if they are filled with “somewhat satisfied” employees. And we’d be out of business if our clients thought we were merely “good” at what we do. We aim for the Top 2 Box and urge our clients to so so as well.

Which Way to Compare: Part 2 – Why Percentage Distributions Are Also Better Than Indexing

Some of our clients are used to seeing comparative data presented in the form of an index – a standardized score set to 100. Their own score for the indexed item is then shown relative to the index. For example, if the national norm on a survey item is a score of 60, then 60 is indexed to 100. If the client’s score on that item is s72, that is 20% higher than 60, producing a relative score of 120.

With an index, data users can see at a glance how they stand relative to the comparative norm. This way of showing a comparison is conceptually easy to grasp and is common in types of research where the focus is heavily on the relative strength of whatever is being measured in comparison to some standard. Certain types of advertising research, in particular, uses indexing. When a proposed new ad is being tested, for example, the emphasis is not just on how well the ad does with its intended audience, but also on how well it does relative to other ads. A successful ad has to break through media clutter and there is no benefit in producing an ad and making an expensive media buy only to have the ad fail to stand out against the noise of other ads.

The strength of using an index, however, is also its weakness. By its very nature, an index puts the analytic focus on the comparison, not on whether the actual score is high or low, good, bad or indifferent. An example can help make this point. Imagine that we are designing a dessert menu for a restaurant and they survey their patrons and ask “Do you like chocolate flavored desserts?” We find that against the industry norm, the restaurant’s patrons index below the norm - only 90 against the index of 100. We might be tempted to limit the presence of chocolate in the dessert menu in favor of other flavors. An examination of the actual percentages, however, is more revealing. The norm shows that 80% of the population like chocolate desserts, while the restaurant patrons’ score was 72%. So despite the low index score, this is still a substantial majority of the folks dining at the restaurant and we would certainly want to make sure there are some chocolate goodies on that dessert menu.

As in the example above, using an index can be misleading. This is especially true if the index represents a very high percentage (like the example above) or a very low percentage. A client who “beats” an index based on a low score may feel proud, even though there is little to brag about, and may feel no changes or improvements are needed even when there is plenty of room to do so. Similarly, failing to match an index based on a very high percentage can cause clients to be upset or start wasting time and resources fixing a problem that isn’t really much of a problem at all.

So while we do feel that indexing has its place – wherever comparison is the whole point of the exercise – we think that it’s generally better to know a little bit more of the details. Most research is about making decisions and setting priorities that are about much more than whether or not you have outscored an index. A percentage distribution compared to a well-constructed norm lets data users really see where they stand, set goals that are achievable and meaningful, and take pride in legitimate successes.

Thursday, March 28, 2013

Which Way to Compare? Part 1 – Why Percentage Distributions are Better than Averages

Our work for our clients, especially our employee satisfaction and engagement studies, often includes comparisons to national or industry norms or across groups within their organizations. These comparisons enable clients to where they stand in and help them set reasonable goals for organizational improvement. In recent weeks we’ve had several conversations regarding the pros and cons of different ways of expressing these comparative figures – as percentage distributions, as averages and as indexes. We strongly feel that percentage distributions offer the best approach in most cases. Today we’ll show why we prefer percentage distributions over averages and in the next blog we’ll show why we also prefer percentages over indexes.

Averages offer the benefit of simplicity for the end users of data. If a survey question has a 5-point scale that is converted to the numbers 1 through 5, taking the numerical average of the responses produces a score between 1 and 5. It’s then a simple matter to compare across groups. If we put the “5” at the positive end of the scale, then those groups – workgroups, locations, divisions – with higher scores are doing better than those whose scores are lower. It’s easy to glance at a set of these average scores and identify priorities for improvement.

The problem with using averages, however, lies in the nature of the average (technically known as the arithmetic mean), as a statistic. An average is a measure of central tendency and has an underlying assumption that the answers are more-or-less normally distributed. This assumption is often incorrect. It is not uncommon to find survey responses that are skewed toward one end of the scale or even polarized. Using a central tendency measure when there is no central tendency can reduce the utility of the information or even be misleading. A simple example can show why this is true. Imagine three work groups all answering the question “How much do you like your job?” using s 5-point scale. Each group has 10 employees:

In group one, all 10 employees choose the middle of the scale

In group two, 5 employees choose one end of the scale, and 5 choose the other end

In group three, 2 employees choose each of the 5 points on the scale

The average score for all three groups is a “3.” None of these groups has a central tendency and taking an average obscures an important feature of the data – the way the opinions are distributed. If these three work groups all reported to you, which information would be most actionable – knowing that they all have the same average score or knowing something about how the scores are distributed? We think the answer is pretty obvious.

Whether in market research or national politics the difference between winning and losing is often in the percentage distribution, not the average. In the 2012 election, more votes were cast for Democratic candidates for the House of Representatives than for Republican candidates, but we have a Republican majority in the House because of the way those votes were distributed across congressional districts. Nate Silver made his reputation as a predictor of elections by understanding the details of percentage distributions of voter behavior. We feel our clients need and deserve the same level of information about the issues that are important to them. So even though average scores are easy to calculate and present, we think that looking at percentage distributions is worth it.

Thursday, March 14, 2013

If It Can’t Be Wrong, We Can’t Know That It’s Right

Recently we conducted a series of focus groups for a leading white tablecloth restaurant chain. Company management was considering a re-positioning of the restaurant to make it more “contemporary” in relation to its key competitors. So part of the focus group discussion included asking the restaurant’s current and potential customers what “contemporary” meant to them in the context of this type of high-end restaurant. The customers’ response was clear, the term contemporary was polarizing, conveying both positive elements of innovation and modernity, but also suggesting bright lighting, stark design, and a noisier, younger crowd – not the ideal for a restaurant that gets much of its business from couples celebrating romantic milestones and business customers dining with clients or discussing important deals over a lavish meal.

When we de-briefed these results with the client’s ad agency, one young agency staffer took issue with what the customers had said, suggesting “They don’t understand what contemporary means!” Of course, it isn’t unusual for clients (or their ad agencies) to take issue with research findings that don’t support their preferred course of action. But this incident illustrates two key points about research. The first is that research must be designed to actually test the preferred course of action, and the second is making sure we can hear and understand the results of the test.

Research designs that test specific hypotheses, question the flow of presumed processes or evaluate a course of action are fundamental to the scientific method. Rooted in a research history that stretches from Aristotle’s discussion of logic, through Ben Franklin’s key and kite electricity experiments, to Karl Popper’s more formal explication of the scientific method, the possibility of disproof is central to having a meaningful version of proof (http://en.wikipedia.org/wiki/Scientific_method). In market research, however, we often see research designs that do not really offer the chance for the idea in question to be disproven. For example, we see quantitative questionnaires with built-in positive bias toward ideas, concepts or ads – sometimes with no chance at all for consumers to indicate that they don’t really like the idea or prefer an alternative. In qualitative work, some discussion guides and focus group moderators “lead the witness,” producing a favorable halo of commentary around an idea that wouldn’t stand up in the real world of marketplace competition. Regardless of the data collection method, it is our job to build the possibility of disproof into our research designs. Remember, if the ideas you are testing cannot be disproved, your research results, no matter how positive, haven’t really shown anything.

Once we build in the possibility of disproof, it’s equally important to be able to hear, interpret and respond to less-than-welcome research findings. As researchers, we have to resist the temptation to dismiss negative results or interpret them away. As partners with our clients, we have to help them hear and understand the bad news and focus their energy on improving their ideas, products and marketing efforts, rather than wasting their time claiming that their customers are “wrong.”

Wednesday, August 22, 2012

The Perception Dilemma, Or, What Can We Do About Self-Report Bias?

A recent article in the Sunday New York Times called “Why Waiting Is Torture” (http://www.nytimes.com/2012/08/19/opinion/sunday/why-waiting-in-line-is-torture.html?pagewanted=all) brought to mind one of key dilemmas in survey design – the simple fact that people often “misremember” their experiences (which is what we call “self-report bias”). How reliable can survey results be if respondents cannot accurately recall what happened?

The article itself is about the psychology of waiting in lines and some of the points are very interesting (although perhaps not surprising to researchers!):

1. According to Richard Larson at M.I.T., occupied time (such as walking to a specific location) feels shorter than unoccupied time (such as standing around waiting),

2. There is a tendency to overestimate the amount of time spent waiting in line (the article quotes an average of 36%),

3. A sense of uncertainty, such as not knowing how long you will be in line, increases the stress of waiting, while information and feedback on wait times or reasons for delays improve perceptions,

4. When there are multiple lines, customers focus on the lines they are “losing to” and not on the lines they are beating, and

5. The frustrations of waiting can be mitigated in the final moments by beating expectations, such as having the line suddenly speed up.

What implications do these findings have on survey design and analysis? In my experience, if we are trying to get an accurate record of an event – such as the amount of time waiting in line – a straightforward recall question is not always the best choice. There are actions we can take during research design, in developing our data collection tools and in analysis to deal with the problems or poor or inaccurate self-report of behavior.

At the research design stage, we should ask whether a self-report on a survey question is the best way to collect the data. In some cases, we are better off using direct measures, such as observations of the behavior, instead of asking about it. At the questionnaire development stage, we can explore which ways of asking a question are more likely to limit bias, for example asking people what hours they watched TV last night will produce a larger per night (and more accurate) answer than asking people to estimate their total viewing hours per week. In the analysis stage we often know which direction the self-report bias will tend to lean – for example, people generally under-report their consumption of alcohol and over-report their church attendance. When we know these tendencies we can deal with them either by adjusting the answers up or down – if we know the appropriate adjustment to make – or by mentioning them when we report the findings or make recommendations.

The key here is to take the possibility of self-report bias into consideration and to have a plan for dealing with it. The existence of self-report bias does not invalidate research efforts, it is merely one of the many factors that research vendors and clients must take into consideration as they approach their projects.

How Should You Choose A Focus Group Moderator?

An article by Naomi Henderson in the Summer 2012 edition of AMA’s Marketing Research magazine gives a worthwhile list of guidelines for choosing a moderator (You can read this article at http://www.marketingpower.com/ResourceLibrary/MarketingResearch/Pages/2012/Summer%202012/Qualitative-Reflections.aspx).

She points out that such a choice is not necessarily straightforward because “qualitative inquiry is a delicate balance of personality, experience and awareness of the nuances of group dynamics.” In other words, in choosing a moderator, you are not only selecting someone with a particular set of skills, you are also choosing a personality and all the risks that come along with such a choice.

Naomi’s article goes on to give some very practical advice on:

· What types of questions you should ask a prospective moderator

· What types of questions you should ask the references provided by that moderator,

· What types of work samples to request, and

· What to look for in a sample DVD from your prospective moderator.

This advice is worthwhile and useful but one important point she is missing is that there are very different styles of moderating which can have a huge impact on the perceived “fit” between clients and moderators.

In my experience, the two biggest styles are what I call the “laid back” style vs. the “in your face” style of moderating. Both are effective forms of moderating but each can impact the “back room” in different ways.

Over the years, I’ve worked with a number of moderators of the “laid back” variety. They tend to be very calm, which helps the group relax, and are very deliberate in their approach, which means that the topics get thoroughly explored. One moderator in particular made very good use of silences in the group – instead of filling each moment with questions, he let respondents essentially talk through the issues and build on each other without doing a lot of active probing. I think this approach works but, at times, the silences can make certain back room clients uncomfortable because they are not “getting what they want.”

Personally, I’m more of the “in your face” type of moderator. These moderators take a very active role in the group, tend to run very high energy sessions and work very hard to avoid silences. Because there is almost always something happening in these groups, clients tend to get a sense that it is a “good” group. However, clients can also miss some of the nuance in these groups or feel that certain topics were not fully addressed.

My main point is that, in addition to the Naomi’s practical suggestions for choosing a moderator, a good addition is to also ask a prospective moderator “how would you characterize your style of moderating?” In doing so, think about the team/internal clients you will have working on your project and what style of moderating might fit best with them.

Sunday, May 20, 2012

What Questions Help Improve the Effectiveness of Qualitative Research?

All effective qualitative market research projects must start with a clear understanding of the background and objectives for each project. To help define a project and determine the appropriate methodology, we generally ask our clients the following questions:

  • What are the research objectives? What are you hoping to learn? What background information can you share which led to the need for this research?
  • Are there any other ways you might describe what you’re trying to explore in this research? (This question can help provide more richness to the definition of study objectives.)
  • What team/internal clients is this research being conducted for? Does this team or these clients have specific preferences about how research results are summarized and/or presented?
  • Are all team members in agreement about what this research should explore – and, if not, what are the differing perspectives?
  • What have you/your team already done to explore these issues? (This can include previous qualitative research, quantitative research, internal data, secondary research, etc.)
  • Have you ever done similar research in the past and, if so, are there any issues that were not addressed then, that you now wish you had explored? Do you have any frustrations concerning the last time you did similar research?
  • What other initiatives/internal issues might affect this research? AND what other initiatives/internal issues might be affected by this research?
  • What decisions will be impacted by the learning from this research? How might you act differently based on what you learn? Also, what are areas that can’t be changed, regardless of what the research might learn?
  • Are there any hypotheses about the answers, among your team or your internal clients? If you were to imagine that the project is complete, what’s your ideal outcome?
  • What do you expect to be the biggest challenges we encounter as we conduct this research?
  • What specific constraints do we need to keep in mind?
  • What stimulus material – if any – do you want to people react to and what format will it be in?
  • What issues related to target audience might be relevant to know as we design the research? And are there any customer segments we should keep separate or perhaps combine for any reason?

Clear and thoughtful answers to these questions are essential to meeting both the stated and unstated objectives of any qualitative research project. These questions help to decide (1) if qualitative research is the right methodology for your objectives and, if so, (2) which approach would best meet your needs (such as deciding between focus groups and individual depth interviews), (3) what the necessary recruiting specifications are so that an accurate and effective screening questionnaire can be written, (4) what issues need to be covered in the moderator guide and, finally, (5) how your research analyst should prepare the project deliverables.

It’s impossible to overstate the importance of setting clear objectives BEFORE you undertake any research project if you want to have a successful outcome. In fact, this initial discussion can avoid those disastrous “Oh, by the ways” that have destroyed many research efforts!