Numbers are vital to science and scientists. No matter how obvious a hypothesis may seem, just thinking that something is true doesn’t make it so. But equally important is ensuring that we analyze data appropriately, carefully and rigorously. Scientific pitfalls can lurk in even the seemingly simplest of analyses, and a good scientist is always her harshest critic.
This point was driven home to me a few months ago, when I came across an article entitled “Unicorns and MBAs” by Jeff Bussgang in the Huffington Post, evaluating the role of MBAs in business unicorns. (These are those rare start-up companies, established in the last 10 years, which are now worth more than $1 billion). The author was challenging the conventional wisdom that Unicorns (such as Facebook, Tumblr, etc.) are most often founded by twentysomething founders. Instead, he said, the data support a very different view of the business landscape: namely that having an MBA onboard is important for start-up success.
- 33% of the Unicorns had at least one founding member who had an MBA. Examples include Kayak (Steve Hafner/Kellogg), Workday (Aneel Bhusri/Stanford and Dave Duffield/Cornell), Yelp (Jeremy Stoppelman/HBS) and Zynga (Mark Pincus/HBS).
- 82% of [Unicorns] had at least one founding member or current executive team member with an MBA. Examples of unicorns where MBAs were hired to help build the company include Evernote (COO Ken Gullicksen/Stanford), Facebook (COO Sheryl Sandberg/HBS), Twitter (COO Ali Rowghani/Stanford).
These numbers certainly seem very suggestive that the conventional wisdom might not be entirely true, but should we be so quick to change our tune?
Well, let’s think about this problem a different way. Imagine that you have a giant, opaque jar full of Skittles. You’re hungry, and so you grab a handful of deliciousness: amazingly, of the 30 candies that you took, all of them are red! Since a Skittles mix normally has five different colors, to only pick out red ones is highly unlikely,* and naturally your mind starts to run away with you—
But then you look inside the jar.
And now you see that all of those Skittles are red too. So of course, you only picked out red Skittles! There were only red Skittles to pick from.
This is the crux of the whole problem: you need to know what you’re choosing from. Before you can be shocked at only picking red Skittles, you need to first establish how full the jar itself is of red Skittles. In science parlance, you need to know your “background distribution.”
So, if we now return to the MBAs and unicorns hypothesis, the issue clearly is that Bussgang never established the background distribution—that is, we don’t know how the fraction of non-Unicorn companies with MBAs. What if only 10% of non-Unicorn companies have someone with an MBA? Then, yes, there are many more MBAs in Unicorn companies than we would expect—i.e. MBAs are “enriched” in Unicorn companies. But what if 95% of companies have someone with an MBA? Then MBAs are “depleted” in Unicorn companies. And if 82% of non-Unicorn companies have an MBA, then Unicorn companies look a lot like every other company.
How we interpret the article’s numbers depends entirely upon this landscape, this background distribution. Without knowing that, we have no framework to place Bussgang’s numbers into, however suggestive they may be.**
To put it another way: if all horses were unicorns, they’d be a lot less remarkable.
*We can even quantify how (un)likely this is: if 1 in 5 are red and if the number of skittles is large, then by chance we have essentially (1/5)^30 odds of picking all red ones.
**Of course, even with the most thorough of analyses, these data still can’t speak to causality: i.e., whether having an MBA onboard causes companies to be more (or less) successful.
Pingback: Understanding (and respecting) the limits of your data | How to Be a Scientist