This is a bit of a weird diversion from normal programming, but I thought I’d give it a go. On the weekend it was the final of the Australian Rules Football League, the AFL, and – rather remarkably for the game – it ended in a draw. Those of you who know anything about scoring in AFL will guess that this is pretty unlikely, especially in a grand final. So a blog I regularly read, Larvatus Prodeo, have a post up pondering the possibility that we live in an era of close grand finals. I offered to attack this problem with my (apparently somewhat rusty) time series analysis “skills”, and this post is an attempt to present the results.
Winning and losing scores since 1950 were provided by the author of the post at Larvatus Prodeo (I rather suspect he’s been writing them by hand on the day of the final since 1950, but let’s assume they’re accurate because no AFL fan would get something as important as this wrong). Visual inspection was used to investigate the possibility of a narrowing of winning and losing scores. The margin of victory was divided by the winning score to give a measure of the proportionate margin (referred to as such from now on) and ARIMA time series analysis used to estimate the seasonal patterns (if any) in the proportionate margin. A linear model was then fitted to the proportionate margin by year, and serial dependence in the residuals estimated using another ARIMA model. Where serial dependence was evident, a second linear model was fitted using generalized least squares adjusted for the identified serial dependence, and the resulting estimates of the straight line fit were presented.
It was my original intention to fit these models using Stata 11 but my version of Stata seems to be broken so I used R. Unfortunately, R’s time series analysis software is … dubious. And I’m all at sea doing time series analysis outside of SAS, so you should take this with a grain of salt.
Figure 1 shows the winning and losing scores in the grand final for the last 60 years. There appears to be a narrowing of scores before the 1980s and another narrowing the previous 10 years, though the results shown in this chart are hardly conclusive. There’s also a hint of a jump in scores at about 1980, which actually shows up very clearly in the differenced time series plot of loser scores but not of winners.
A plot of winner’s scores against loser’s scores suggests a strong relationship between the two, but this is not unexpected given that both teams are trying to win, so is not presented here.
Figure 2 shows the proportionate margin by year. There appears to be a periodic relationship in this margin, but in fact analysis of this period does not suggest it is very significant.
Analysis of autocorrelation and partial autocorrelation functions (not shown) suggests that the serial dependence in this model is AR(2), that is the margin of victory in any given year is related negatively to the margin of victories in the previous two years, and very strongly to the previous year. This is to be expected – teams learn from a very large margin of victory in the prior year, and adapt to fit new tactics, significantly reducing the margin the following year. Fitting a linear model, with year as the only predictive term, we find that there is no serial dependence in the residuals of the model, but that year is not statistically significant (p=0.151). This means we don’t need to fit a generalized least squares model, though doing so with an autoregressive (2) term makes no difference. On visual inspection the residuals do seem to be uncorrelated, suggesting a model with no serial dependence. The linear model only explains 0.1% of variance, suggesting a lack of predictivity for simple models of AFL scores.
There is no evidence of a strong pattern, either periodic or linear, in margins of victory in AFL grand finals. There is some evidence that the margin of victory in a given year depends on what happened in the previous two years, and tends to regress towards the mean, suggesting that teams learn from previous blow-outs. There is otherwise no evidence of any time dependence in this data.
Note that there are two important additional caveats: one, that estimates of significance in ARIMA models in R are thought to be wrong; and two, that there is an additional source of serial dependence in this data, in that some of the grand final scores are for the same team in multiple years. Obviously the blow-outs in figure 1 correspond to the years when Port Adelaide won the grand final, or when the Crows lost. A more sophisticated analysis of AFL scores would use all scores for the year, adjusting for serial dependence by team and game, and using a version of Stata that works rather than R.
Finally, I think that the AFL maintains a draft system and a salary cap. This would be consistent with the findings presented here.
fn1: I think it would be great if all research papers presented their caveats in this style. But I’ve had 3 beers, and that is not the usual condition under which research papers are written
fn2: He assumes, bravely
fn3: possibly because I’m pretty crap at fitting periodic models, and after 3 beers really not even very good at working out how many radians to divide 40 years into, or what should be on the top or bottom line of the sine function. But I think I got it right.
fn4: I read this in a journal article when I was actually doing something important, but I can’t be bothered finding it now
fn5: May I use this term loosely?