Monday Melbourne: CCXVIII, October 2010
Russell Degnan

Light to study by. Taken October 2010

Melbourne Town 25th October, 2010 22:34:26   [#] [0 comments] 

SQL Test Scores Database
Russell Degnan

Apropos to David Barry generously offering his collection of statistical data, and having finally got to updating my parser for the changes to cricinfo's scorecards, I'll do the same.

Download the zip file available here: testscores.zip.

Instructions for use:

To begin

  1. Unzip testscores.sql and import into your database. I use mysql, running on XAMPP (which includes PHPMyAdmin).

That is sufficient to look at the data. It includes all test matches (bar the ICC XI travesty) up until 24th October 2010. If anyone wanted to rewrite the parse script for ODI/T20 games feel free to do so and I'll add it to the file.

The format is relatively straight forward but you'll need to work out what is what for yourself (and know some basic sql. I'll leave comments open on this post for questions).

  • team - numbered 1 to 10, in the order they first played test cricket.
  • game - a test match, use g_id as index reference, g_cricinfoid to reference the cricinfo scorecard.
  • innings - a team innings, referenced by i_id, references game with i_testid, team by i_teamid and bowling team by i_oppid
  • player - a player, uses the name first encountered, p_id for reference, p_cricinfoid for the cricinfo player id, none of the details are filled (TODO).
  • bat - a batsman's innings, references i_id via b_inningsid and p_id via b_bat_pid.
  • bowl - a bowler's innings, reference i_id via w_inningsid and p_id via w_bowl_pid.
  • extras - extras in an innings, references i_id via e_inningsid.
  • fow - the fall of wickets in an innings, including injured partnerships (marked as unbroken, with two partnerships having the same wicket number. FOW needs careful coding, naive queries will be slightly off because of not outs and retirements. References i_id via f_inningsid, b_id via f_open_bid (batsman in middle, or 1st in order for openers), f_dis_bid (batsman dismissed if any), f_no_bid (batsman not out).
  • close - score at close of play and batsmen/overs (if available, scorecards are incomplete). References i_id through c_bat_iid and batsmen b_id through c_bat1_bid and c_bat2_bid. Needs parsing of available game notes (g_notes) for luncheon/drinks intervals (TODO).
  • series - collated series of games. References game numbers via s_testid_list.

To update

  1. Download the scorecard from cricinfo and save somewhere. I have a batch downloader for people running their own webserver (again, XAMPP) called score.php that gets the 20 most recent scorecards; add ?page=X to the URL to get older cards. Copy them to a download directory.
  2. You need perl, I use Strawberry perl for windows, as apparently does Larry Wall, but suit yourself. You also need to download (via CPAN) the following packages:
    Text::CSV_XS;
    HTML::TreeBuilder;
    DBI;
  3. Strip the crud around the scorecard with the command:
    perl strip.pl < download/XCRICINFOID.html > clean/XCRICINFOID.html

  4. Run the parser to add to the database. Will delete all records for that gameid, but can cause unwanted records if it errors (which it might if cricinfo changes their format or for other reasons).
    perl players.pl clean/XCRICINFOID.html
  5. Update series data (and do any other post-processing):
    INSERT INTO Series ( s_season, s_home_tid, s_away_tid, s_length, s_date, s_home_win, s_away_win, s_drawn, s_tied ) SELECT g_season, g_home_tid, g_away_tid, max( g_seriesnum ), min( g_date ), sum( g_home_tid = g_result_tid ), sum( g_away_tid = g_result_tid ), sum( g_result_type like 'drawn%' ), sum( g_result_type like 'tied%' ) from Game where g_seriesid is null group by g_season, g_home_tid, g_away_tid order by g_testnum;
    UPDATE game,series set g_seriesid = s_id where s_home_tid = g_home_tid and s_away_tid = g_away_tid and s_season = g_season;
    UPDATE series set s_testid_list = (select group_concat(g_id) from game where g_seriesid = s_id group by g_seriesid);
    UPDATE game, innings set i_oppid = g_away_tid where g_id = i_testid and g_home_tid = i_teamid;
    UPDATE game, innings set i_oppid = g_home_tid where g_id = i_testid and g_away_tid = i_teamid;
    UPDATE bat,innings set b_testid = i_testid where b_inningsid = i_id;
    UPDATE bowl,innings set w_testid = i_testid where w_inningsid = i_id;
    UPDATE extras,innings set e_testid = i_testid where e_inningsid = i_id;
    UPDATE fow,bat set f_testid = b_testid, f_inningsid = b_inningsid where f_open_bid = b_id;

Any problems or suggestions drop a comment here.

Known Problems

  • Players who change names get the first instance of their name (notably MoYo), best manually edited.
  • Grounds are stored by name, not id.(TODO). This gets around the above problem however.

Idle Summers 24th October, 2010 12:23:18   [#] [7 comments] 

Stakeholders to stakeholders: yay for context
Russell Degnan

How pleasant, this morning, to wake up to an email from Cricket Australia telling me that Cricket Australia CEO, James "Stakeholders" Sutherland, himself was going to speak about ICC reforms to test and one-day cricket. "Finally", I thought, "someone privy to ICC decision making is going to stand up in front of a house-trained, internal CA journalist and bunt back some softball questions on why these decisions were made".

Naturally, we learned very little about the thought process behind the decisions, though neither can it be ruled out that we learned everything about that thought process.

What we did learn was illuminating however. Sutherland had "no doubt" that a test league would bring context and interest to every game, because they all "count for something". Yet, challenged to show appropriate contextual concern for Australia's current position on the rankings (which by the by, are fundamentally not a league table) he said he wouldn't be worried until 2012/13. The anticipation of that period when every test will be brimming full of context, unless a team is already knocked out, or has already qualified, is killing me already.

We also learnt that every match in the one-day league championship will be equally contextual, with games affecting league position and "perhaps qualifying for the world cup". That must have been a slip, because it sounded like an admission that only teams playing one-day cricket, and involved in an as-yet unspecified league will get the opportunity to qualify for the world cup. And that there will be no "world cup qualifiers"; the sort that might accidentally involve an important nation missing the world cup.

A real interview might have asked about that, but the closest we came on that front was the blanched face and stumbling attempt to respond to having the ten team world cup described as a "disgrace". Apparently that decision shouldn't be isolated from other, better decisions [a 16 team T20 cup], but "balanced in a broad sense" [whatever that means], and that there is merit in having more or less teams [though no attempt was made to explain what particular merits led to adopting a smaller tournament].

But the overall message was clear enough: T20 is for developing nations; leave the real cricket to real teams, with real players. I mean, why would an associate cricketer want to play test cricket anyway? Don't they realise the format is fundamentally doomed?

Idle Summers 22nd October, 2010 12:58:10   [#] [1 comment] 

Monday Melbourne: CCXVII, October 2010
Russell Degnan

The sun shines over the black clouds hanging over the Domain. Taken October 2010

Melbourne Town 18th October, 2010 20:29:33   [#] [0 comments] 

Ratings - 16th October 2010
Russell Degnan

Recently completed matches

2nd TestIndiavAustralia
Pre-rating1203.851221.21
Form+2.70+1.13
Expected MarginIndia by 41 runs
Actual MarginIndia by 7 wickets
Post-rating1209.831216.52
Series rating1303.461121.64

For such a short series, India-Australia has generated a remarkable number of articles on the ascent of India and the demise of Australian cricket. Remarkable too, because the latter has been clear since the last Australian tour of India, and the subsequent loss to South Africa, while the former seems to be confusing a mathematical quirk of the flawed ICC ratings with prolonged dominance.

India won easily, but did so without playing particularly well. They don't help themselves, with some woeful captaincy and poor fielding, but even the core of their game indicated some significant weaknesses. There were three major collapses in the series: 6/51, 8/124 and 8/149 (which concluded with 5/9); and their bowling, occasionally decent but often wretched conceded 400+ in both first innings. Take out Tendulkar (and eventually they'll have to) and the result could easily have been reversed; though the performances of Pujara, Vijay and Raina were indicative of a certain strength in depth. Good sides win even when they play poorly, and India have shown that quality a number of times recently; but good teams also win by consistently outplaying their opposition too, and India aren't doing that.

For Australia this may well be the worst possible result. The loss was no more than expected, and it was a largely creditable one, led by a dogged Ponting. But there is a regularity to their weaknesses that needs to be rectified: the collapsing (in both second innings), the failure to keep the scoring rate down (particular Hauritz and Johnson), and the number of batsmen getting starts and not going on. Unfortunately, the clamour for change, so prevalent after the first test has quietened, as the players under pressure probably enough to save their spots even as they (and the team) failed to perform at key moments.

For the moment, Australia retain their place at the top of the ratings, but India will almost certainly pass them during their series against a struggling New Zealand, or when Australia turn out against a surging England. It is entriely possible Australia could slip to fifth by the end of the summer, but the results and ratings over the past 2 years suggest something else: Australia are still as good as anyone, and, at home, should always go in as favourites.


Rankings at 16th October 2010
1.Australia1216.52
2.India1209.83
3.South Africa1193.34
4.England1158.25
5.Sri Lanka1109.33
6.West Indies919.14
7.New Zealand917.91
8.Bangladesh638.24
9.Zimbabwe556.79

10.Ireland556.46
11.Scotland461.60
12.Afghanistan445.10
13.Namibia388.49
14.Kenya338.92
15.U.S.A.296.99
16.Uganda268.44
17.Nepal196.51
18.Netherlands195.69
19.U.A.E.182.53
20.Canada177.51
21.Hong Kong148.65
22.Cayman Is134.24
23.Malaysia123.90
24.Bermuda105.40

Shaded teams have played fewer than 2 games per season. Non-test team ratings are not comparable to test ratings as they don't play each other.

Idle Summers 16th October, 2010 21:17:06   [#] [8 comments] 

Monday Melbourne: CCXVI, October 2010
Russell Degnan

Flemington. Taken October 2010

Melbourne Town 12th October, 2010 21:35:35   [#] [0 comments] 

The Ins and Outs of Potential World Cup Formats
Russell Degnan

Having put forward a case for why a smaller world cup is both unfair and unnecessary, it is worth going through some of the potential options for a larger tournament. I've covered some principles of a good tournament before, and won't repeat them here. Instead I will focus on the specific mechanics of stages.

There ought to be four basic aims in designing a format:

  • You want to minimize luck, such that the best team wins
  • You want to maximize the number of games that are decisive, such that the teams that progress are never clear.
  • You want to minimize the number of mismatches between two teams of different standards.
  • You want to maximize efficiency, so that the tournament is not too long

As can be seen from the diagram of possible formats below, the number of games in any stage increases rapidly with the number of teams in a group (in the sequence 1,3,6,10,15,21,28 etc.). More importantly, because the eventual denouement normally splits two teams, the larger a group is, the greater the gap between the line that separates progress from failure. No matter the group size, it is not unusual to see the top team beat everyone below them, the second those below them, and so on down. This has the perverse result of making only one game decisive: that between the worst team that goes through, and the best team that doesn't. Shorter groups make that problem less obvious, but in any group there is a decisive line between the teams that go through, and the teams that don't, and it is the inequality across that line that ultimately matters.

For this reason, I prefer world cup groups of 3 or 4 teams, with generally two progressing. However 5 is possible in certain circumstances, to be discussed. In a 3 team group, the order of group games needs to be adjusted to keep interest in the third game: if two teams progress, the winner of the first game should play in the second game, the loser the third; if one team is to progress, the loser of the first game should play in the second game, the winner the third. That said, three team groups in the opening stage should generally be avoided however, for two reasons: firstly, the effort required to qualify is made a mockery of if a team is limited to just two games, and secondly, it produces decisive games early in a tournament when the fan generally expects a bit of leniency for poor play.


Four possible proposals are given below, with my personal preference given to the latter formats. The number of days listed is the bare minimum amount, given at least two days break between games for every team; the number in brackets is the number of games in a slightly more relaxed tournament.

16 Team World Cup

GroupsTeamsGamesDays
44249 (12)
24129 (12)
Semis22
Final11
Rest3
Total3924 (30)

This is the personal preference of many, and has great appeal, combining a succinct number of games with a slightly longer second round to maximize the tv potential of the test teams. It's weakness lies in the first round where the line of qualification splits between the strong eight test sides and the weaker test sides/associate teams. While 2007 proved that this doesn't preclude them progressing, it also proved that it can make for some boring games. A 16 team world cup is also too short for broadcasters, rolling in at only 39 games.


24 Team World Cup with Quarters

GroupsTeamsGamesDays
64369 (18)
43129 (12)
Quarters43 (4)
Semis22
Final11
Rest3
Total5528 (40)

24 Team World Cup without Quarters

GroupsTeamsGamesDays
64369 (18)
43129 (12)
Semis22
Final11
Rest3
Total5124 (36)

Slightly messy, as 24 team world cups generally are (the problem is removing the odd prime multiplier), and with a relatively high number of first round mismatches. The 24 team world cup has the advantage, however of splitting between teams ranked 7-12 and teams ranked 13-18, which are generally competitive games. The second round, consisting of three games can split into either quarters (with 4 extra games but again splitting 5-8 vs. 9-12) or semis, where teams would need to win every game to progress.


20 Team World Cup

GroupsTeamsGamesDays
454015 (20)
Round 2843 (4)
Quarters43 (4)
Semis22
Final11
Rest3
Total5127 (34)

A 20 team world cup lies between the 16 and 24 team editions for quality, with a number of mismatches in a longer first round (it effectively adds a poor team to each group of a 16 team world cup). At first glance, that is a bad idea, but a twist makes it substantially more interesting. Instead of moving to quarter finals, incentive can be given for both topping the group, and coming third, giving decisive lines between 1-4 vs 5-8 and 9-12 vs 13-16 (with 17-20 being quite competitive in those games as well. First place is given a bye to the quarter finals, while 2nd and 3rd placed teams play-off in a second round.

This is my preferred format for several reasons: the minnows have a clear target in making the second round, with the added incentive that upsets in the first round could get them into second place and a potentially easier second round game; the major test teams can afford an upset in the first round, as they'll almost all come in the top three; and the game between the top 2 in the group has real spice, as no team would want to play an extra game, even if they are expected to win easily. Finally, its length is reasonable, being only a few days longer than the 16 team edition (though with more games over-lapping) substantially shorter than recent cricket world cups, but still passing the 48 game broadcasting requirement.

Idle Summers 11th October, 2010 08:11:00   [#] [5 comments] 

Ratings - 9th October 2010
Russell Degnan

Recently completed matches

2 TestsIndiavAustralia
Pre-rating1204.581220.57
Form+5.53-0.20
Expected MarginIndia by 42 runs
Actual MarginIndia by 1 wicket
Post-rating1203.851221.21

A classic, only lacking a more fitting context than a two-test money-spinner at the start of the season. India, by and large, looked the better side, but can count themselves fortunate to have escaped with a win, having collapsed poorly in their chase of a moderate target. Australia were both gritty, as you'd expect, and fragile, as has been seen too often. Watson and Paine's efforts in the first innings were exemplary, and Ponting continues to work hard at the start of series, even if his best days are clearly behind him. Zaheer Khan was a deserved man of the match, carrying an often listless attack (and suffering from a distinct lack of effort in the field) to keep India in front, at least until Johnson's late hitting got Australia to a decent first innings score.

Sehwag was his enigmatic self, but India will be disappointed they didn't score more runs in reply. North, operating instead of a woefully ineffective Hauritz, prompted a mini-collapse after picking up Tendulkar. This gave Australia a chance of winning a game that was tending towards either a draw or an Indian win up that point, as well as marking the end of what had been excellent umpiring for the first three days.

The Australian collapse, losing 10/105 in 42 overs arrived as scheduled, as regular and frequent as a Japanese train. Poor shot-making, a couple of woeful (albeit balanced) decisions and whatever the hell Clarke was doing set up an intriguing chase, but it should never have been enough runs.

That it was, almost, was due to some poor Indian shot-selection, some canny bowling from Hilfenhaus and Bollinger, and some bizarre decisions from all involved. What Raina was doing out there running is beyond me, a tense chase is not the place for a player in his second series, even if he is fit and fast. Why Ponting persists with defensive fields to superior batsmen is also unknown. Not only does it gift easy singles to the partnership (the life-blood of a tail-ender who is easily bogged down), it essentially allowed Laxman to play aggressively as there was little chance of being caught. The glut of runs proved Australia's undoing, as the runs required fritted away quickly, only slowing as the finish-line neared, and the intensity rose. Laxman was serene through-out proving once again that he thrives when most keenly challenged, and that ability to perform in the clutch was ultimately all that separated the two sides.

The ratings remain stagnant, as you'd expect in such a close contest. Australia really struggled to match India in this match however, and I expect the home side to run away with the next game.


I-Cup MatchKenyavAfghanistan
Pre-rating351.46404.44
Form+3.55+100.81
Expected MarginKenya by 24 runs
Actual MarginAfghanistan by 167 runs
Post-rating338.92445.10

A comprehensive victory for Afghanistan who continue their fine record in the longer form of the game on the strength of their bowling. Hamid Hassan was the dominant figure again, taking 11 wickets, albeit a little expensively. An entertaining game though, with here over 1150 runs were scored and 36 wickets fell in the first three days. Nawroz Mangal anchored Afghanistan's first innings with 168, while only Seren Waters, fresh from a county stint showed any life in Kenya's disappointing reply of 160. Otieno completed a fine game for him, taking 4 wickets in each innings to provide a target of 512 for Kenya, but while several players got starts, noone went on and they fell well short; the tail collapsing to Hassan's burst early on the last day. Afghanistan continue their climb up the rankings, with the opportunity to surpass Scotland when they play in the final in late November.


Rankings at 9th October 2010
1.Australia1221.21
2.India1203.85
3.South Africa1193.34
4.England1158.25
5.Sri Lanka1109.33
6.West Indies919.14
7.New Zealand917.91
8.Bangladesh638.24
9.Zimbabwe556.79

10.Ireland556.46
11.Scotland461.60
12.Afghanistan445.10
13.Namibia388.49
14.Kenya338.92
15.U.S.A.296.99
16.Uganda268.44
17.Nepal196.51
18.Netherlands195.69
19.U.A.E.182.53
20.Canada177.51
21.Hong Kong148.65
22.Cayman Is134.24
23.Malaysia123.90
24.Bermuda105.40

Shaded teams have played fewer than 2 games per season. Non-test team ratings are not comparable to test ratings as they don't play each other.

Idle Summers 9th October, 2010 10:44:54   [#] [0 comments] 

Umpiring in the 21st Century
Russell Degnan

Much of the debate over UDRS doesn't object to the technology, so much as the means taken to adjudicate on it. Having players question the umpire's decision is an unedifying spectacle, and slows the game down. This is doubly ironic when you consider that a different process might achieve the same results without a referral at all.

Globally available light-weight smart-phones with sufficient processing and communications power exist to convey information to the centre instantly. The on-field technology of the UDRS - essentially a walkie-talkie and a request for an off-field assessment - is an anachronism in a world of instant communication.

We present here, therefore, several proposals for improving umpiring, based on readily available technology, that would improve decision making, and speed up the game.

Instant Umpire Decision System

Availability: could be implemented tomorrow

The not-so-humble smart phone is the key to improving umpiring. Fundamentally, it is no more than a small, light-weight computer and screen with wireless connectivity. The information assessed post-decision by the third umpire is quite straight-forward: did the ball pitch outside leg? did the ball hit the batsman in line? would it have gone on to hit the stumps? More importantly, that information is stored electronically and available relatively quickly. There is no reason, therefore why it could not be conveyed via a wireless antenna in the broadcasting box to an application on the umpire's smart phone prior to them making the original decision.

With a quick glance to confirm (or over-turn) their original impression, the umpire could make their decision with the same level of accuracy as the existing UDRS process, but all on the field.

But that isn't the only modern technology that could be applied, with a little work.

Edge Detection

Availability: technology is available

HotSpot has been a mixed experience. Good for tv viewers but unreliable because the bat is not always clearly visible, or the mark sufficiently noted. The technical solution is the application of touch sensitive strips to bats. measuring as little as 0.5mm. These would easily sense the ball, and the size and width of contact. However, they also need to be connected to a wireless chip (with a close proximity receiver), a battery and chip. The weight (perhaps 30g) and size (25x5mm) would be no problem, and the chipset could be taped in below the bat handle. Add a light-weight accelerometer and other tv-centric information like bat-speed could be sent via the broadcaster.

No-ball Detection

Availability: needs research

This is significantly more complex than it seems. The law only requires that some part of the foot land behind the line, not be grounded making it difficult to distinguish between a foot passing over the line, and one that has landed. More difficulties arise with the technology. Curvature of the ground would prevent the sort of fault-line technology used in the tennis while anything laid on the ground would quickly be destroyed by bowler's spikes. The most likely option would seem to be visual recognition technology similar to that used by hawkeye, to detect where the foot landed.


Instant referral is a must, however. in this day and age, there is no reason why an umpire must rely on a man in a box to convey the same information they could have had sent to them on demand.

Thanks to Achettup and Kartikeya for inspiring this piece.

Idle Summers 8th October, 2010 12:25:09   [#] [3 comments] 

Monday Melbourne: CCXV, October 2010
Russell Degnan

Heat brings storms. Taken October 2010

Melbourne Town 6th October, 2010 22:34:04   [#] [0 comments] 

  [next -->]