Sunday, 3 February 2019

Back to Bayes-ics

As explained last post, to do our football analytics, what we need are some input parameters about how "good" the two teams facing each other in a match are likely to be, on the day. There are two alternative approaches to doing this. One is based on classical statistics. To follow this approach you look back over a load of matches and work out an average scoring rate and an average rate of conceding. You can also estimate, on average, how much better the team performs at home. This approach has some weaknesses though. A team can get better or worse over the season; Its no good at telling how good the team will be today. It also requires quite a lot of data (a lot of matches) and assumes all the teams form are stable over time. In other words, it makes some assumptions that are not true. Which is never good.
A much better approach is to use Bayesian statistics. Thomas Bayes was a statistician with a keen interest in games of chance. Hence his work is very relevant to all sorts of gambling! The formulas he gave us are all about inferring the underlying truth from a series of observations. Each observation modifies our belief in a given hypothesis. To cut a long story short, bayesian inference crops up everywhere, in modern analytics.
The particular method I am deploying for football match analysis is the Particle Filter - a modern development, based entirely on Bayesian inference. You can find a pretty good intro to particle filters in this slide deck. Note the reference to football results analysis on slide 24... Using a PF for football analysis is s a nice party trick that often crops up in tutorial material, although I do it in a slightly more sophisticated way to the standard approach.
Applying a particle filter to the English Premier League works like this:

Each team is represented by a large number of "particles", each of which is a guess at the "model" - i.e. the qualities of the team (its attacking strength, defensive strength etc.)
  • Between fixtures, we "advance" these models, saying in effect. Last week the team was like this, so this week, how might the team have moved on
  • After a fixture, we "filter" the particles, preferentially keeping those that best explain the result. Incidentally, this is where Bayes comes in. His theorem says that instead of asking the hard question, "how good is my particle (model), given the result", we can ask "how likely is my result, given my model". This turns out to be an easier question and one we can answer. Importantly we consider not just the result but the capabilities of the teams involved. Hence all the analysis is interconnected.
  • Now, when two teams face off, we have a set of guesses about the teams capabilities at the present time that is based on all previous results, especially the last result. We can model the game considering the full range of guesses and get the best possible odds prediction, given the evidence.
In a nutshell, that's it. My plan now is to publish some predictions before the weekend fixtures and try to ascertain if we can beat the bookies!. That's my goal. Bookies are there to be beaten after all.

No comments:

Post a Comment