A report from the Big Data Debate at the Corporate Researchers Conference (CRC)

I am biased. Having taught the predictive analytics class in the pre-CRC conference workshops, I have some ideas about the role of market researchers, data scientists and the messy intersections between them. I mean us. Wait… OK, well, that’s why it is messy.

Here are a few points made by each speaker at the MRA’s Corporate Researchers Conference, followed by my take away.

Annie Pettit, of Peanut Labs

  • Impediments to full data access mean the promise of “big data” are largely unrealized: missing data (she mentions one data set with 53 million missing data points), how IT department may not allow access to some data sets, or how outside organizations won’t collaborate in some cases.
  • Big data tells you what, but not why.
  • Cites two concerns: Do you want information about your social network connections impacting if you get accepted to college? Or getting a mortgage approval?
  • Don’t get hung up on the size of the data set—as market researchers, you know SAS, SPSS and R—you are

Marc Alley, of First Global

  • Claims “big data” is defined by data volume (if stored on CDs, he said, it would be 5 piles to the moon). He also refers to it as unstructured data too, including video and text. And later, he offered this definition, “Big data is too big to analyze with traditional means.” (As an aside—I disagree with these definitions).
  • Acknowledges that some applications may be uncomfortable/feel like privacy invasion.
  • Mentions that many companies can buy access to big data: as an example, mentions “Wealth Engine”, a database you can buy that aggregates data about wealthy individuals.

My take away

The semantics are creating artificial barriers between “market research” and “big data.” The reality is companies are doing a better job of collecting data (like web site traffic, purchase data, service data and more). And companies can also append in-house data with purchased data (think of Experian, as an example). We researchers need to figure out how to show that finding intersections with our data (often collected through panel databases, survey data sets, etc) and other data (CRM, web site data, etc) can be used for advantage. For example, in the audience, a question came from a gentleman asking about how his organizations could combine various but disparate in-house data sets to uncover education needs amongst his customer base. My thoughts for him:

  • Can you append existing customer records with survey data to find out if people with specific past behavior are more or less likely to self-report the desire for new education topics?
  • Or can you do a test: to find out if new topic X is going to be a popular class, find out what type of people who express interest in topics via surveys end up completing them when offered?
  • Or might it be helpful to simply get access to a few variables of data from his CRM friends about past education choices, to see if their choices imply themes of interest? For example, maybe people who buy class A and class B also tend to buy class C?
  • Afterwards, I spoke with the amazing Megan Peitz from Sawtooth (booth 55) and she had a great idea: conduct a MaxDiff exercise that tests both past and possible future topics, to see if there is a correlation between past topics and new topics. This could be used to identify the attributes of people who are likely to value those new topic ideas (thanks, Megan!!).

But, I don’t think big data itself can identify new topic areas. Big Data uses market research data about actual (past) behavior to predict likely future behavior. But if the future behavior of interest is something new, there is no historical data to work with. But it could tell him if customer groups exist that tend to buy new classes/new topics in general. I’m at booth 56 if you want to chat about this great topic.

Originally published as "Big Data at CRC" at Research Rockstar.