This is Chapter 1 from the new book, “Applied MaxDiff” by Keith Chrzan and Bryan Orme of Sawtooth Software. Within the U.S., “Applied MaxDiff” may be purchased from Amazon. The E-book (PDF) version may be purchased directly from Sawtooth Software.

What Is MaxDiff?

MaxDiff (short for maximum difference scaling and the name marketers have given to a method more commonly known in academia as Best-Worst Scaling, or BWS) has become the measurement equivalent of the Swiss army knife.  A tremendously useful method, it has an expanding base of users who continue to find new and innovative applications for it. 

Conceived as a multiple-choice version of Thurstone’s (1927) method of paired comparisons, the basic case of MaxDiff quantifies the relative value of each of the several items on a list.  For a study of 20 ice cream flavors, for example, we might ask 12 questions that look like this:

Exhibit 1.1 - Sample MaxDiff Question

In this ice cream study, each question contains just five of the 20 flavors and a respondent needs only to identify the single flavor she likes most and the one flavor she likes least of the five.  The other 11 questions would ask different subsets of items and the particular subsets asked in each question would come from an experimental design.  The design ensures that when we model the responses, we can isolate the relative value (or “utility”) of each ice cream flavor.  In this case a sample of 250 respondents answered our ice cream survey and here are the utilities we estimated for them:

 

 Flavor

Utility

Chocolate

2.14

Vanilla

2.09

Strawberry

1.79

Mint Chocolate Chip

1.41

Cookie Dough

1.40

Cherry

1.17

Pistachio

1.17

Mango

1.16

Raspberry

1.09

Moose Tracks

1.07

Neapolitan

0.85

Rocky Road

0.83

Peach

0.72

Coconut

0.57

Lavender

0.54

Peppermint

0.47

Lemon Custard

0.47

Lychee

0.35

Key Lime

0.31

Green Tea

0.00

 

Exhibit 1.2 - Utilities for 20 Ice Cream Flavors

A higher utility implies more liking than a lower utility, so we can conclude that our 250 respondents like Chocolate (2.14) more than they do Raspberry (1.09) and Moose Tracks (1.07) more than they do Key Lime (0.31).

MaxDiff has a number of advantages over ordinary rating scales.  Because it constrains the way respondents can answer the question, MaxDiff removes the scale use biases commonly seen with rating scales, particularly for cross-cultural research studies (i.e., respondents use rating scales in different ways—some use the top end of the scale and others the lower end, while some respondents use all the scale points and others pile up their answers in a narrow portion of the scale).  Also, MaxDiff forces respondents to make tradeoffs among the items, so it tends to discriminate among items more powerfully than ordinary rating scales do and it more readily identifies between-group differences on the items, as Cohen (2003) found in a much regarded paper comparing MaxDiff to rating scales.  Comparing MaxDiff to rating scales and four other methods for measuring attribute importance, Chrzan and Golovashkina (2006) confirmed these findings and added that MaxDiff had greater predictive validity than any of the other methods they tested.

Researchers can use MaxDiff to generate utility information for pretty much any list of items; this gives it great flexibility in the hands of survey researchers.  Often, commercial researchers use MaxDiff to measure the relative appeal of a list of items like new product concepts, flavors, or menu items so that marketers can prioritize their development efforts.  Applied to advertising, MaxDiff can help researchers evaluate a list of advertisements or advertising claims, or even to prioritize a list of message elements that can then be combined into advertising executions.  Healthcare researchers can employ MaxDiff to understand how patients value different health states or different aspects of treatment regimens.  In quality improvement research, MaxDiff can inform decisions about which aspects of a product or service customers most want to see improved.

Beyond appeal, researchers frequently use MaxDiff to measure attribute importance, a common need in marketing research staples like brand image and customer satisfaction research.  Moreover, MaxDiff serves as a general measurement technique, a method to consider using any time one has a list of items that need to be compared or evaluated.  In their initial publication on MaxDiff, Finn and Louviere (1992) used MaxDiff in a public policy study of food safety concerns.  Lee, Soutar and Louviere (2007) found MaxDiff to provide a more valid measure of the List of Values (LOV) scale than the original rating scale formulation while Chrzan (2014) used MaxDiff to replace rating scales in measuring the Five Factor Model of personality.  In another example, which brings MaxDiff back to one of Thurstone’s original case studies for the method of paired comparisons, Sá Lucas (2004) used MaxDiff to measure the relative severity of each of 25 crimes, with items ranging from “disturbing the peace” to “bombing a public building.”

Some History

The full account of the history of MaxDiff can be found in Best-Worst Scaling:  Theory, Methods and Applications (Louviere et al. 2015), pretty much the Bible for academic applications of MaxDiff scaling.  An abbreviated history is as follows.

Thurstone’s (1927) work on what he called “the law of comparative judgment” introduced the method of paired comparisons (MPC).  It also kicked off the development of the random utility model (RUM) that we use today to analyze MaxDiff, conjoint and all manner of other kinds of choice experiments. 

Thurstone conceived of the paired comparison as a simple way to develop a scale that discriminates among objects in a set, by asking about them two at a time.  To continue our ice cream example, we might ask

“Which of these flavors do you like better?

[ ] Chocolate or

[ ] Cherry”

And then…

“Which of these flavors do you like better?

[ ] Mango or

[ ] Mint chocolate chip”

And so on. We humans find paired comparison questions easy to answer well (there’s a reason your eye doctor asks questions like “Which is clearer, this one, or this one” as he has you compare lenses).

David (1969) reviews a number of strategies for designing paired comparison experiments and for quantifying the underlying scale.  If willing to abuse respondents, we could ask enough pairs from our respondents to enable us to estimate each respondent’s unique utility scores for a large list of items, though analysts historically used MPC experiments more often to get sample-level utilities. 

Finn and Louviere (1992) named their extension of the method of paired comparisons Best Worst Scaling (BWS), the name still used in the academic literature.   Noting that BWS/MaxDiff is a multiple choice extension of the classic MPC model, Finn and Louviere (1992) sought to get respondent-level preference information from it.  They could do so because MaxDiff surveys wring a great deal of information out of just two mouse clicks per question.  For example, in the following question respondent Jones indicates that he likes Vanilla the most and Raspberry the least. 

Exhibit 1.3 - MaxDiff Responses

From these two mouse clicks we learned that Jones likes

  • Vanilla more than Raspberry
  • Vanilla more than Cookie dough
  • Vanilla more than Coconut
  • Vanilla more than Peach
  • Cookie dough more than Raspberry
  • Coconut more than Raspberry
  • Peach more than Raspberry

Thus, the MaxDiff question identifies Jones’ preference in seven of the 10 possible pairs one can form from the five flavors.  With a similar amount of information coming from 11 more such questions, we have a wealth of information about Jones’ relative preferences among ice cream flavors.  We learn Jones’ preference with less effort than having him answer a large number of paired comparison questions or than forcing him to rank order the full list of 20 flavors. Importantly, we get enough preference information from Jones and each of the other respondents that we can estimate the utility values not only for the sample of respondents, but also for Jones and each other individual respondent. 

References

Chrzan, K. 2014. MaxDiff as a General Psychographic Scaling Method: The case of the five factor personality model.  The SKIM/Sawtooth Software Conference, Amsterdam, Netherlands.

Chrzan, K. and N. Golovashkina. 2006. An empirical test of six stated importance measures. International Journal of Market Research, 48:717-740.

Cohen, S. 2003. Maximum difference scaling: Improved measures of importance for preference and segmentation.  In Sawtooth Software Conference Proceedings, pp. 61-74.

David, H.A. 1969. The Method of Paired Comparisons. New York: Hafner.

Finn, A. and J. J. Louviere. 1992. Determining the appropriate response to evidence of public concern: the case of food safety. Journal of Public Policy and Marketing, 11(1):12–25.

Lee, J.A., G.N. Soutar and J. Louviere.  2007. Measuring values using best‐worst scaling: The LOV example. Psychology and Marketing, 24:1043-1058.

Louviere, J.J., T.N. Flynn and A.A.J. Marley.  2015. Best-worst scaling: theory, methods and applications. Cambridge:  Cambridge University.

Sá Lucas, L. 2004. Scale development with MaxDiffs:  A case study. In Sawtooth Software Conference Proceedings, pp. 69-82.

Thurstone, L. L. 1927. A law of comparative judgment. Psychological Review, 34:273–286.