Stats

The Sloan Conference is a big, well known conference that focuses on the use of advanced statistics and analytics in sports. In a lot of ways, Sloan is a good representation of the modern, post-internet world’s love affair with data — even games we theoretically play for fun are being sliced and diced to find efficiencies and disprove reliable old cliches. 

I was reading about this year’s conference in this Deadspin piece, and there’s a great anecdote about Steve Kerr, a former NBA player and General Manager, and one of his less-productive experiences with data. 

I’m in Phoenix,” he starts. “Amar’e Stoudemire was going to be a free agent, we had to think about trading him, so we’re watching tape of all these different guys, and we’re looking at J.J. Hickson. One of our stats guys says, ‘J.J. Hickson shoots a better percentage from 0 to 5 feet than Amar’e Stoudemire.’ I said, ‘You’re kidding me, right?’ He says: ‘No, it’s right here. It’s 65 percent to 62 percent.’ I think, that’s impossible, Amare’s one of the great finishers in the history of this league. Especially with all the tough ones, all the little four-footers. So we watch J.J. Hickson, and after about 20 minutes I realized that every one of his shots from one to five feet is actually from one foot. All he can do is dunk.

As far as I can tell, there are really only two potential problems with using data to make decisions. The first one is that you may have bad data. In industries and fields with a lot of money and established, vetted technology and data-gathering methods (like sports), this isn’t usually a problem. In Chinese factories, it’s a serious one

But the second kind of problem has nothing to do with the data itself, and everything to do with how you use it — for instance, in Kerr’s example, the Suns used shooting percentage data to determine who was a more effective inside scorer by seeing who had the easiest opportunities to score. Oops. Fortunately, as in many things, sports tend to adapt to these kinds of mistakes pretty well, because they’re like little controlled experiments, with a fixed (even if large) set of variables, and measurable, straightforward goals. (“Score more runs than the Cubs.”) 

Business isn’t usually like that. You often end up with a whole bunch of different goals, some very familiar and similar to other companies, and some wildly unique or difficult to measure. Generally, people in charge of businesses LOVE data, because data makes things predictable and allows for planning, and business people LOVE being able to plan for things. And to be honest, this really is tremendously valuable. Being able to measure and predict things like customer acquisiton costs, required support time, and things like that can be enormously helpful, provided that you can do it well. 

That last part really gets taken for granted, especially today, where everyone is constantly talking about data, and entire industries of predictive analytics spend a lot of time promoting the idea that all data is instructive when sufficiently analyzed or aggregated, from bounce rates to Twitter followers. 

Good statisticians and analysts don’t need to be told this is a problem. They already know it, which is why they are good statisticians and analysts. But most decision-makers aren’t, and like anyone trying to solve a problem they don’t really understand by getting someone else to do it, they often have unrealistic expectations of what can and can’t be accomplished through certain channels. 

Analytic dependency

“Do you know Google Analytics?” I hear this all the time, and… sure I do! The problem is, that question isn’t like “can you read Python code”, or “can you make me things in Illustrator”, because those skills are fixed, and it’s understood that it’s up to the business person hiring those skills to apply them to somethign useful. Instead, “Do you know Google Analytics” means “can you tell me truths about the universe from looking at Google Analytics”, which isn’t necessarily a limitation of the person using the tool as much as a limitation of the tool, your business, or the universe. 

In other words, people think Analytics (Google or otherwise) is a business intelligence tool, when really, it’s a data collection tool with the ability to aggregate data into things like sometimes look like intelligence, but usually aren’t (“10% of our organic search visitors come from Argentina!”). So really, you need two things — someone to pry data out of Analytics, and someone to look at that data, and tell you “our ad campaign is/isn’t working”, or “that idea you had was great/terrible”. 

In fact, you actually want more than that. You want someone to tell you those things, and be correct. And that’s where you can really get in trouble. Because now, we have all this data, and it’s very, very easy to make arguments like “J.J. Hickson was a better inside scorer than Amare Stoudamire in 2010”, and support those arguments with real, verifiable, improperly used data. This is not really a flaw in analytics — it’s a flaw in relying on immature versions of analytics, without an understanding of the universe they measure to support them.

I see this more and more as people get excited about data, and it’s scary. I’ve had smart bosses say “show me the data, and you’ll win this argument”,  which is absolutely terrifying, because like most people with Analytics accounts, I can get data to say just about anything. General understanding of important subject matter CANNOT be outsourced to statistics, because you need that understanding to get any reliable value FROM those statistics. Someday, maybe we’ll live in a world where a stat like “average time on site” (or something better and shinier) means something concrete in every scenario, and we can just look at the number and make a conclusion and be right every time, even though we don’t know who our visitors are or what they’re visiting. But that day is not today. 

Being wrong, feeling right

I don’t mean this as a shot at quantitative people — I really don’t — but I don’t think it’s a coincidence that the loudest supporters of blind statistical obedience are often people with no taste, who struggle to connect with other human beings through ideas, concepts, or traditional, qualitative communication. (To be fair, a lot of anti-stat people are bullshit artists who are terrified of being discovered.) At some point, I’m sure some high school student is going to make a statistical argument for why the popular kid should go to the prom with them, and they’re going to do it because the idea of a world where everything is not only math, but math that we can easily connect to real life situations, is so compelling.

(“How can you say no??? My humor/athleticism composite score is in the 4th quintile for our grade; at this point, you’ve only got a 31% chance of doing better. Plus, I already rented a tux.”)

But companies like Dell and Samsung have been beating their heads against the wall trying to figure out why people buy iMacs and iPhones despite some stupid proprietary metric or Magic Quadrant saying they shouldn’t for years, and that’s going to continue. 

Stats are powerful. Stats can reveal truth where biases and ignorance breed blindess, or confirm good intuition with evidence. That’s one of the reasons it’s so exciting to watch new, better, more evolved measurements develop over time. But the scary truth is that we’re not very good at analyzing most of this data just yet, and when that’s the case, analytics are entirely capable of doing vastly more harm than good, especially when goals are more complicated than “beat the Lakers”.  

So be careful out there. Especially if you think “you know Google Analytics”.