Society is currently coming to terms with the exponential increase in data now available — often tagged big data — and the effect its use is having on our lives. Historically, we viewed datafication in the context of our physical environment, mapping continents or charting the stars, say, to enable boats to navigate from one to another.

Datafied sentiment

Now, though, the business of transforming phenomena into data has moved on. And, since our lives are often played out in a digital environment, the possibilities for datafication are endless. Indeed, it has even turned to what was once an intimate and private world, human relationships, with Facebook's social graph firmly claiming the data territory. Twitter has datafied sentiment to the extent that it can be used to predict the success of films or even stock market performance.

So does this herald the end of market research's data collection role in the form of a survey or a focus group? Big data is increasingly providing us with answers that are far more granular and accurate than we ever could have dreamt of collecting by asking questions of respondents.

We might argue that research gives us the "what" but not the "why". But, as Viktor Mayer-Schönberger and Kenneth Cukier argue in their recent booki, do we actually need to understand "why" certain actions are being undertaken in a big data world? We don't have to know why people who purchase curly fries will also typically purchase muesli, just that they do. And, on that basis, it allows us to make the appropriate marketing decisions — simply target curly fry buyers with muesli offers. In this way big data is dramatically transforming the way we conduct business and, as such, it will inevitably change the way in which we undertake research. So what, if any, is the role for qualitative research in this datafied world?

The answer to this, perhaps, comes from statistics. It has long been recognised by those working with data that, given a large enough sample size, most data points will have statistically significant correlations because at some level everything is related to everything else. The psychologist Paul Meehl famously called this the "Crud Factor'ii, a term describing the way in which these faint traces can lead us to believe there are real relationships in the data where, in fact, the linkage is trivial.

Nate Silver, big data's pin-up boy since his success in predicting the outcome of the US presidential election, made the same point when he warned that the number of "meaningful relationships" is not increasing in step with the meteoric increase in amount of data available. We simply generate a larger number of false positives, an issue endemic in data analytics which led John Ioannidis to suggest that two-thirds of the findings in medical journals were not robust.iii Silver's viewiv is that, to address this, we always need to understand the data's context. His solution is in the form of Bayesian statistics, which call upon the researcher to find the "prior probability" of the hypothesis, i.e. provide a measure of the baseline condition. He points out that a lack of focus on the broader context can mean that we generate false positives: studies that have statistically significant findings but which are manifestly wrong, such as how toads can predict earthquakes.

So we might start to see an increasingly exciting role emerging for qualitative researchers: one where, increasingly, they become guardians of the consumer context? For, as market researchers, we often have a large amount of implicit understanding of consumers, yet don't always recognise that this is so until we engage with the computer scientists, mathematicians and statisticians who manage big data process within companies. All intelligent and highly skilled personnel, but lacking a rounded understanding of the consumer.

And it is this contextual understanding that drives the myriad decisions when undertaking analysis. As Lisa Gitelman points out in the title of her book, Raw data is an oxymoronv, analytics necessarily involves making decisions: about which data to look at, what composite variables to generate, what constitutes an outlier, and so on. These decisions involve human judgement, often well intentioned but guided by assumptions concerning what is important and why. The point is that the data do not speak for themselves, as Silver says "We speak for them. We imbue them with meaning."

At the moment there is a large, up-to-date and vibrant body of work that provides brands with the consumer context. From this we generate hypotheses, and a framework for exploring the data and interpreting the outcomes. Without this baseline understanding of the consumer we would struggle to make sense of big data.

Jumping to conclusion

So, for example, if someone is purchasing "value" items from a supermarket we may form an idea very quickly as to who that shopper is. Maybe this reflects someone who is low waged, with a poor educational background, socially conventional leisure activities, etc. But we know that there are many reasons why consumers may shop in this way: indeed, many relatively well-off shoppers have money to spend on big ticket items but are simply thrifty. Changes in economic conditions, meanwhile, create different cultural norms of what is acceptable, thereby changing the motivations and profile of value shoppers.

It's easy to opt in to comfortable stereotypes, but good qualitative researchers question our assumptions and provide more intelligent and complex hypotheses to be explored in big data. To deliver, they will more than ever need to ensure that they are sensitised to the delicate nature of nascent consumer behaviour, coupling this with an ability to challenge and influence established ways of seeing the world. Far from reducing the role of the quallie, big data has the potential to dramatically extend and enhance the opportunities.