Thoughts on the state of cricket analytics
Cricket is a sport that has always held itself in high regard with respect to the quality of its writing and statistics. In some ways this is deserved. It has a record base dating back to middle of the eighteenth century, and a steadily growing set of data on what happened ball by ball. The writing has followed the records, idolising landmarks, achievements and placing them in the context of almost 150 years of internationals.
In other ways cricket is slipping well behind the quality of analysis, both written and statistical found in other professional sports. The recent Sloan Conference gives a taste of the kinds of questions being asked in basketball, baseball and football (soccer), and the complexity of analysis that affords.
The analytics research is backing a revolution in journalism, with writers like Zach Lowe or Ben Falk digging into specific plays (with video) and their variations; using the analytics to analyse tactics, performance and tendencies that makes us (as consumers) better informed by what players are trying to achieve, and where they might succeed (or not).
Cricket writers aren't matching their peers because not only is the analysis not available, the data to perform it is lacking even further behind. There is a pyramid on which those insights have to sit. At base are the raw statistics of runs scored, balls faced, actions and decisions. Above it, in which cricket has excelled, are the records, compiled averages, accumulations and events.
On records sit ratings, the judgement of which player is better, based on analysis of the statistical record, and corrections for advantages in conditions or opposition. Cricket has a healthy ratings base, but (and this may be rich from someone who has compiled ratings for over a decade) they are low-rent statistical analysis. Most rating systems will give the same answer for who is best and worst - naturally - and rarely do they provide some insight into those in between, or the game being played in front of us.
Predictions, based on ratings provide a more useful function, as they allow comparison between what is happening and what ought to. At some level cricket offers these insights, though they rarely go into detail on the calculation or the variation, which leaves them prone to mocking laughter when they are wrong, and banal observations (such as that a side is 90% likely to win) when they are right.
To really dig into a match we need analytics: an understanding not just of the outcomes of a match, but the outcomes of individual balls and the wealth of data that underpins the actions in a match. Because, only by understanding the likely consequence of a tactical decision, can you offer insight into why a decision was right or wrong.
The state of cricket analytics is dire.
There are several reasons for these problems, each compounding on others.
Firstly, cricket data of any value is not only not public but proprietary. In the NBA the league office provides an obscene amount of data on play outcomes by player, match-up comparisons, and means to download and conduct analysis. Their partners in turn want the insights of analysts, and provide options to subscribe to data sources, apis and promotion of good ideas (via Sloan and other means). The ecosystem of analysts and data providers is rich and ever-growing, even as those analysts are picked up by teams who (at least for a while) hide their private advantages.
In cricket the ICC provides no cricket data, so it is left to media partners, ball tracking companies and others to provide a mish-mash of difficult to access and near impossible to analyse data. They in turn look only to broadcasters for the capital to develop analysis, which tends to both limit its scope and output.
Secondly, really important things aren't tracked at all. Peter Della Penna has written at length on the need for improved fielding statistics: for dropped catches and fielded balls, in order to understand defence. But this barely touches the surface of the types of analysis available to baseball writers. Consider the types of questions we can't answer about cricket that are only a few clicks away for baseball writers:
- How many slips were in place for each bowler, throughout the day? When did they change? Does an aggressive field induce different shots?
- How many edges were taken? Where on the bat did a batter hit the ball? Or a bowler? How does the cricket stat that is collected but hidden (about control) correlate with ball position?
- What was the ball speed off the bat? The carry? The fielded position? The location of the fielder in relation to the ball fielded or missed? Which fielders have the best range and how many runs do they cut off?
- Which stroke did a batter play? What is their average (and control) on those strokes? Do they choose the right ball to hit?
This last question is key to a vast array of questions, which shift us from analytics to insight. Because thirdly, cricket has no baseline of performance in specific situations to determine if something is good or expected. Wickets, in this situation, are not an ideal measure, much as goals aren't for football, as the sample size is miniscule. But measures of control, and the probability of poor control leading to a dismissal need to be calibrated and brought into the discussion around decision making.
In basketball the difference between high efficiency and low efficiency shots is vast, and a massive tactical shift has occurred driving players to choose high efficiency shots even at the expense of more turnovers. That leads to gorgeous charts (courtesy of Kirk Goldsberry) of, for example, James Harden's shooting frequency and relative efficiency:
Cricviz has some level of this information, but their pitch maps and beehives lack context without some understanding of not only the shot played, but whether that is an efficient shot (in general, or for that specific batter). It should be very easy to call up data that shows whether or not David Warner is an above average player of the drive (based on different passing points) and by extension, whether his zone of played ball is larger (and perhaps less circumspect) than other players. But for writers that is pure speculation, based on what we think we are seeing, and our preconception of how Warner plays.
Fourthly, in the absence of ball-by-ball highlight footage it is very difficult to craft a good story around the tactical decisions of bowlers and batters that would enhance the game. Between League Pass and the highlights it is possible to examine a mountain of NBA footage, enough to track down play-by-play to see how a defence set up, how they reacted, and learnt from subsequent events, and how the offence in turn shifted to take advantage of the change.
For T20 cricket, where matches can shift from ball to ball, and the field is set in precise ways to bowl to, and for the batter to evaluate the risk against, this insight is essential to shift the conversation from it as a sport of sloggers (which is surely is not) into a sport of nuanced risk assessments and occasional blunders.
That requires the analytical information on field placements and how bowlers choose to bowl to them, how they shift from ground to ground, how batter react to them, and how they feed into the strengths and weaknesses of the players. From there, the video footage can highlight what was tried, and how it succeeded (or didn't).
This shift to decent cricket analytics will be a long process, other sports have two decades of work to draw on, while cricket remains mired in light-weight rubbish. The gap between the best tactical writing on cricket and other sports is vast, growing, and depressing if you want to watch and gain insights into the game.
And it matters, beyond my personal nerddom, because poor analytics feeds poor commentary and weak insights. Cricket writers spend too much time on contrived controversy, quotes and milestones, when they ought to be talking about how an actual game was played out in front of them. Occasionally a retired player shows glimpses of the game they played, but as the game continually shifts away from them, the insights become less acute, and the breadth of events on the field can elude even a keen observer (being generous).
Cricket analytics ought to be a huge and interesting field of research, and I don't know of anyone who would think it is any of those things, because of the problems outlined above.
11th March, 2018 14:29:59
This is a wonderful article. Great stuff, brings together so many threads that have been tangled in my head for many years now. Couldn’t have said it better. This will be my go-to article on this subject now. Almost makes me want to fire up the blog again :).
Devanshu Mehta 12th March, 2018 12:49:47
Thoughts on the state of cricket analytics
Thanks Devanshu. Am somewhat surprised by the response, which is a sign that I should probably blog more myself.
Russ 14th March, 2018 21:42:10