Thoughts on the state of cricket analytics
Cricket is a sport that has always held itself in high regard with respect to the quality of its writing and statistics. In some ways this is deserved. It has a record base dating back to middle of the eighteenth century, and a steadily growing set of data on what happened ball by ball. The writing has followed the records, idolising landmarks, achievements and placing them in the context of almost 150 years of internationals.
In other ways cricket is slipping well behind the quality of analysis, both written and statistical found in other professional sports. The recent Sloan Conference gives a taste of the kinds of questions being asked in basketball, baseball and football (soccer), and the complexity of analysis that affords.
The analytics research is backing a revolution in journalism, with writers like Zach Lowe or Ben Falk digging into specific plays (with video) and their variations; using the analytics to analyse tactics, performance and tendencies that makes us (as consumers) better informed by what players are trying to achieve, and where they might succeed (or not).
Cricket writers aren't matching their peers because not only is the analysis not available, the data to perform it is lacking even further behind. There is a pyramid on which those insights have to sit. At base are the raw statistics of runs scored, balls faced, actions and decisions. Above it, in which cricket has excelled, are the records, compiled averages, accumulations and events.
On records sit ratings, the judgement of which player is better, based on analysis of the statistical record, and corrections for advantages in conditions or opposition. Cricket has a healthy ratings base, but (and this may be rich from someone who has compiled ratings for over a decade) they are low-rent statistical analysis. Most rating systems will give the same answer for who is best and worst - naturally - and rarely do they provide some insight into those in between, or the game being played in front of us.
Predictions, based on ratings provide a more useful function, as they allow comparison between what is happening and what ought to. At some level cricket offers these insights, though they rarely go into detail on the calculation or the variation, which leaves them prone to mocking laughter when they are wrong, and banal observations (such as that a side is 90% likely to win) when they are right.
To really dig into a match we need analytics: an understanding not just of the outcomes of a match, but the outcomes of individual balls and the wealth of data that underpins the actions in a match. Because, only by understanding the likely consequence of a tactical decision, can you offer insight into why a decision was right or wrong.
The state of cricket analytics is dire.
There are several reasons for these problems, each compounding on others.
Firstly, cricket data of any value is not only not public but proprietary. In the NBA the league office provides an obscene amount of data on play outcomes by player, match-up comparisons, and means to download and conduct analysis. Their partners in turn want the insights of analysts, and provide options to subscribe to data sources, apis and promotion of good ideas (via Sloan and other means). The ecosystem of analysts and data providers is rich and ever-growing, even as those analysts are picked up by teams who (at least for a while) hide their private advantages.
In cricket the ICC provides no cricket data, so it is left to media partners, ball tracking companies and others to provide a mish-mash of difficult to access and near impossible to analyse data. They in turn look only to broadcasters for the capital to develop analysis, which tends to both limit its scope and output.
Secondly, really important things aren't tracked at all. Peter Della Penna has written at length on the need for improved fielding statistics: for dropped catches and fielded balls, in order to understand defence. But this barely touches the surface of the types of analysis available to baseball writers. Consider the types of questions we can't answer about cricket that are only a few clicks away for baseball writers:
- How many slips were in place for each bowler, throughout the day? When did they change? Does an aggressive field induce different shots?
- How many edges were taken? Where on the bat did a batter hit the ball? Or a bowler? How does the cricket stat that is collected but hidden (about control) correlate with ball position?
- What was the ball speed off the bat? The carry? The fielded position? The location of the fielder in relation to the ball fielded or missed? Which fielders have the best range and how many runs do they cut off?
- Which stroke did a batter play? What is their average (and control) on those strokes? Do they choose the right ball to hit?
This last question is key to a vast array of questions, which shift us from analytics to insight. Because thirdly, cricket has no baseline of performance in specific situations to determine if something is good or expected. Wickets, in this situation, are not an ideal measure, much as goals aren't for football, as the sample size is miniscule. But measures of control, and the probability of poor control leading to a dismissal need to be calibrated and brought into the discussion around decision making.
In basketball the difference between high efficiency and low efficiency shots is vast, and a massive tactical shift has occurred driving players to choose high efficiency shots even at the expense of more turnovers. That leads to gorgeous charts (courtesy of Kirk Goldsberry) of, for example, James Harden's shooting frequency and relative efficiency:
Cricviz has some level of this information, but their pitch maps and beehives lack context without some understanding of not only the shot played, but whether that is an efficient shot (in general, or for that specific batter). It should be very easy to call up data that shows whether or not David Warner is an above average player of the drive (based on different passing points) and by extension, whether his zone of played ball is larger (and perhaps less circumspect) than other players. But for writers that is pure speculation, based on what we think we are seeing, and our preconception of how Warner plays.
Fourthly, in the absence of ball-by-ball highlight footage it is very difficult to craft a good story around the tactical decisions of bowlers and batters that would enhance the game. Between League Pass and the highlights it is possible to examine a mountain of NBA footage, enough to track down play-by-play to see how a defence set up, how they reacted, and learnt from subsequent events, and how the offence in turn shifted to take advantage of the change.
For T20 cricket, where matches can shift from ball to ball, and the field is set in precise ways to bowl to, and for the batter to evaluate the risk against, this insight is essential to shift the conversation from it as a sport of sloggers (which is surely is not) into a sport of nuanced risk assessments and occasional blunders.
That requires the analytical information on field placements and how bowlers choose to bowl to them, how they shift from ground to ground, how batter react to them, and how they feed into the strengths and weaknesses of the players. From there, the video footage can highlight what was tried, and how it succeeded (or didn't).
This shift to decent cricket analytics will be a long process, other sports have two decades of work to draw on, while cricket remains mired in light-weight rubbish. The gap between the best tactical writing on cricket and other sports is vast, growing, and depressing if you want to watch and gain insights into the game.
And it matters, beyond my personal nerddom, because poor analytics feeds poor commentary and weak insights. Cricket writers spend too much time on contrived controversy, quotes and milestones, when they ought to be talking about how an actual game was played out in front of them. Occasionally a retired player shows glimpses of the game they played, but as the game continually shifts away from them, the insights become less acute, and the breadth of events on the field can elude even a keen observer (being generous).
Cricket analytics ought to be a huge and interesting field of research, and I don't know of anyone who would think it is any of those things, because of the problems outlined above.
11th March, 2018 14:29:59
[#] [2 comments]
WCL2 Review, WCQ Preview; Associate Cricket Podcast
Associate cricket is in the middle of a key period and Andrew Nixon (@andrewnixon79) joins Russell Degnan (@idlesummers) are here to cover them. World Cricket League Division Two saw Nepal and the UAE move into the qualifiers (0:20). Those qualifiers will be played in Zimbabwe and we've previewed the tournament and the contenders, as well as the coverage (or lack of) (14:00). Tournaments for the smaller naions are gearing up too. The ACC eastern sub-regional T20 was won by Bhutan (28:30) and the first of the World T20 qualifiers will start in the Americas - Southern region (40:30). And finally, there is news from the ICC, Canada and Malaysia (30:50).
Direct Download Running Time 47min. Music from Martin Solveig, "Big in Japan"
The associate and affiliate cricket podcast is an attempt to expand coverage of associate tournaments by obtaining local knowledge of the relevant nations. If you have or intend to go to a tournament at associate level - men`s women`s, ICC, unaffiliated - then please get in touch in the comments or by email.
1st March, 2018 20:53:39
[#] [0 comments]