Big Data

The Crash Course channel on YouTube is primarily designed for students to provide a general overview of relevant topics, while glossing over specifics and numerical examples. It’s great if you know you’re heading into a lecture and want a little head start so the concepts are easier to wrap your head around.

This clip introduces the idea of Big Data, which many people hear about and are intimidated by but don’t understand.

One of the first things you try to drill into a student when introducing statistical concept is the famous phrase “co-relation doesn’t mean causation”. That is, if two things seem to be related and tend to appear together reliably (like say people being wet and it being raining) you are not automatically entitled to conclude that just because it is raining outside, it doesn’t mean that someone will be wet. (You know, because a proportion of people own things called umbrellas.)

However, that is only part of the story, because in actual practice, co-relation is the only thing we actually get to measure most of the time. Which would naively imply that no one should ever be allowed to say that something causes something else. Really it means that co-relation is a necessary, but not a sufficient, condition for causation. Only some kinds of co-relation really matter. So if a friend says they took an herbal remedy and it cured their cold, that’s co-relation but probably not causation. But if you study a hundred people and conduct a double-blind placebo controlled trial with that herbal remedy and still find co-relation, that starts to look like causation.

What Big Data means is that if you are really smart (or are a computer with a lot of processing power) and you are making statistical judgements (like whether it might be worth spending $0.38 to put this add for sneakers in front of a given Facebook user) then you can actually get staggering amounts of reliable information from co-relation. (Like whether or not you like Hello Kitty might be used to infer your political leanings.)

So we are entering a new paradigm where if you limit your understanding of statistics to the basic “co-relation doesn’t imply causation”, you can be quickly left behind by others who are willing to use more sophisticated techniques to squeeze more information out of “mere co-relation”.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s