Tuesday, 20 February 2018

Monte Carlo or bust?


Looking around a bit shows I am not unique in using Monte Carlo simulation to estimate the  true odds, in horse racing. For examole, see this monte carlo example. The author has a nice idea of using standard horse ratings, namely Official Rating (OR) and the Racing Post Rating (RPR) as the input to the model. My method is to use the past performance of each horse as the input - which I believe has some neat benefits - but the basic approach is the same.

Basically the first we have to do is define a probability density function (PDF) for the speed we think each horse might run in the race. It might look like this:

It represents the probability the horse will run at any given speed. The peak of the PDF represents the most likely speed for the horse. It may run faster or slower, but each are less likely. The PDF tails off at the edges to show this. The extreme edges are getting pretty unlikely. The shape of the PDF is important. The typical thing to do is to use a Normal or in other words Gaussian form for the PDF. This is not a bad choice, because Gaussian PDFs crop up all over the place in nature, so the likely running speed for a horse probably follows one.

When we execute the Monte Carlo race model we run a lot of imaginary races (maybe 1000 or more) and simply count the times each horse wins. To simulate each race, we draw, at random, example speeds for each horse from the PDF- such that the most likely race speed for each horse is in the middle of its distribution, with the frequency falling off towards the edge of the distribution We then rank the horses based on speed and work out the winner. Here the choice of the Gaussian PDF is handy because computer languages often have a ready made function for generating random samples from a nor distribution. I'm using Python, and it does the job nicely.

So, we have the results of a thousand or so simulated races, so now we can calculate the "true" odds for the horses easily enough. We made a few assumptions along the way, but if these are true, our odds should be good. Just to state those assumptions again:

  • We assume the past performance of the horse provides a good measure of its quality
  • We assume the horse's speed PDF is a Gaussian distribution positioned in proportion to the horse's quality score. 
  • We have to "invent" a width for this Gaussian, (called the standard deviation). This is a bit of a weakness, in that we have to make this up to make the odds look right. Still, it should probably be fairly constant for all races.
So, that's the basis of the approach. I'm working on some improvements that are quite subtle, and I'll introduce in future posts. Now lets see how well it works!

Thursday, 15 February 2018

Horse Race Analytics Explained

As discussed in the previous post, the aim of my horse race analytics is to estimate the "true odds" for each horse in a race, so that we can compare them with what bookmakers are offering, to identify "good value". I put the term "true odds" in quotes (I did it again!) for a reason. It is quite hard to say what the true odds of anything is - let alone a one-off thing like a race. To measure the true odds accurately we would need to run the same race, with the same horses,  the same health, in the same weather and track conditions, a very large number of times and count the outcomes. But the weather is actually an uncertain factor so maybe we'd need to use a few different seasonally appropriate weather conditions. Anyway, clearly this is completely unfeasible, and the true odds are therefore really quite an abstract concept. Its doubtful that a precise number value for the true odds can even be defined in fact. One thing we can do is work out our way of calculating true odds and test it over a large number of real races. Our three to one (3/1) horses should on average win once for every three losses, our 2/1 should win once for every two losses etc. Incidentally if you are not familiar already, it turns out odds are a little different to probabilities. They are quoted as the number of losses vs. the number of wins. The win probability on the other hand will be a number between 0 and 1 defining the likelihood of a win in each race.

My approach to calculating the true odds has two stages. First, we estimate a "quality score" for each horse. The better the horse, the higher the quality score. We need to decide a standard way of doing this - for example we could look at the six previous races and assign a score of 5 for a win, 4 for second etc. I have a more sophisticated approach that I'll describe in a future post, but for now, you get the general ideas. 

Having calculated a quality score for each horse, we then use this to estimate the odds. In simple, made up cases, this might be easy. If three horses race, for example, and they are all exactly equal in terms of quality score then the true odds are 2/1 for each horse, because, on average, in each 3 races, each horse will win one and loose twice. For any significantly complicated example, it gets a lot more difficult. In fact, it rapidly becomes quite impossible to calculate analytically (i.e. using a formula). The way to do it is to do the kind of thing  merchant banks do a lot when analysing trades - Monte Carlo modelling. More about how this works in the next post.

Monday, 12 February 2018

Idle Pursuits

I decided to talk about my latest hobby - horse race analytics. Its something I started thinking about twenty years ago and, after several false starts, I believe I have finally worked out the maths of what I wanted to do and coded up an approach that works. I'm not sure why it took so long!

The basics. Horse racing is an uncertain business. Generally speaking it is never possible to accurately predict the results of a horse race, unless you are a) veeeery lucky or b) own all the horses. The best prediction is usually that the favorite will win. But typically the odds the bookmakers will give you on that happening will not be very good, so if you do it every time, you will, in the long run, loose money. If you back the horse where the bookies are giving the best odds, i.e. they pay you the most for a win, you will also loose money overall, because these horses will win less often. Not never, just less often.

There is, surprisingly, one reliable strategy for making money form horse racing, and that is to pick "value winners" which means that you pick horses where the bookmakers are offering "good value". In other words, they are offering better odds than the quality of the horse would suggest. In yet more other words the horse is more likely to win than they think it is. The "true odds" of the horse are "shorter" than the bookies odds. So, all we have to do is work out the true odds and back horses where the bookies are offering longer odds.

Therein, of course, lies the problem. How to calculate the "true odds". How to calculate odds better than the bookies, whose job it is to do this. They have teams devoted to it; observing races, going to stables, timing, observing, etc. This is what I'm trying to do. It won't be easy - but I think I at least have some maths that can help. I'm planning to use the kind of stuff economists and stock traders know about (at least some of them). To drop in a name, Bayes is the key to this. Bayes is the key to a lot of things.

What I plan to do is develop and refine the method over the coming weeks, publishing what I calculate as the true odds, comparing these to the bookmakers odds and highlighting my betting  recommendation. Over the weeks, we'll see if its working and hopefully refine as we go!

Thursday, 8 February 2018

Bitcoin analytics - reprise

I saw this twitter post from Justin Seitz @jms_dot_py the other day:

https://twitter.com/jms_dot_py/status/958741474572750848

Its good to see this push to uncover the underside of the crypto currency ecosystem going forward.  It again mentions the point that the blockchain is a public ledger, which lists all accounts and all transactions that have ever occurred - forever. That gives quite a bit of time for us to pick over it. De-anonomizing accounts is the only barrier to getting full context on those transactions. Of course, there is only so far it can be taken right now, but in future, with better tools, who knows what we will uncover.

I first came across Justin Seitz through his Black Hat Python book - which is packed full of ingenuity. He also does a lot in the OSINT space and developed the very good Hunchley tool for OSINT investigations. Definitely not a black-hat, but he does like to look under stones. I follow his Hunchley daily dark web report - which is an example of that.