About betting addiction

I want to have a talk with you about gambling addiction. Two things I promised myself when starting BookieBacker; don’t lie and don’t get people addicted. Don’t lie means: all my tips are always online, for free, and my history of results is always accurate. Even if I have a bad month. Don’t get people addicted means: always and always encourage people to see gambling as a fun activity and in no way a substitute for a steady income. Supposedly there are a few people who can survive on betting alone, but those are well beyond my realm of influence.

Story time……

I’ve been there, not addicted but way, WAY to emotionally invested. During the late summer/beginning of fall, when all the European competitions started and I was still actively providing tips for Japan, China, South Korea, USA, Brazil and Argentina, there was always, and I do mean ALWAYS a game that I was betting on running. Because of different time zones I was invested in games during the morning, afternoon, evening and night.

Let me tell you what happens. Seeing your team on 0-0 makes you nervous if they will ever make that opening goal. The longer it takes for the goal to be scored the harder it gets. Seeing your team 1-0 ahead makes you nervous whether they can hang on. Seeing them 1-0 behind annoys you and makes you check whether they finally scored the equalizer. Does seeing your team in a two or three goal lead give you inner piece? Hardly, I’ve lost tons of 2-0 leads and even one 3-0 lead in the 77′ minute. I had a bet on AC Milan in that ridiculous final against Liverpool. Unless a final whistle sounds, there is stress. Imagine this, every weekend, all the time.

There are weekends where it seems that all the football gods are up against you. You lose 5 games in a row and read the match reports about missed penalties, disallowed goals without a proper offside ruling, hitting the woodwork and straight up miracles like the first home loss in 2 years. All stacked against you. You tend to forget the weekends where it’s the other way around.

I remember being at a friend’s birthday one weekend. She just got 30 years old. There was a good crowd, a singer and a great vibe. I was having one of those weekends where everything falls apart in terms of betting. You don’t want to know just how many times I picked up my phone to show someone and to complain. ‘Do you believe this shit, they conceded an equalizer AGAIN’, ‘First loss in 3 months just when I pick them for a bet’. Not everyone was interested, but I didn’t see. I got home, pretty annoyed about it, got in bed next to my girlfriend and sort of growled at the cat you get out of the room because she was making noise. Couldn’t sleep for hours.

The next morning my girlfriend stated how worried she was that there was a big cluster of annoyance next to her in bed. It hit me then, I was letting really short term results getting to me. I was, even though I am good at my profession and overall getting good results, letting the fact that one guy who I don’t even know misses a penalty influence my sense of self-worth. Even worse, it made sure that I couldn’t focus my attention on a friend and her special day, because I had to make it about me and my obsession for eleven strangers failing to find the net. Really makes you feel like a dick.

You see, addiction is not a black and white thing. It’s a slippery slope of growing investment in gambling and not putting your real life first. You are not fine until your house is being sold due to debt. You stopped being fine long, looooooong, before then.

About BookieBacker

It’s time for a series of more personal blog posts. Some sharing on my part, telling you stories about BookieBacker, my goals and what I hope you, as a reader, get out of BookieBacker.

I’ve always been a huge fan of sports betting, the sport and the science; the analytics that come with it. I literally wrote my master thesis on machine learning and sports betting, trying to prove how using a model could both beat a bookmaker and help a bookmaker. Can you imagine that I found a bookmaker in 2007 that was still using a group of five people to decide what the odds for a game should be? I know that wisdoms of crowds is a thing and I believe in it, but five people is not a crowd.

Ever since getting my master thesis I had wanted to build a tipster website, because I knew I could provide worthy tips. However, life got in the way. Who knew back then that an education in data science is a great one to have right now?

I got a demanding job, a girlfriend came along (and left but that’s not what I meant with writing personal blog posts), I got a house and my projects that don’t bring bread to the table sort of end up in the background.

The best moment to plant a tree is 20 years ago, the second best is right now! I always disagreed with this statement because the second best moment is obviously one day after the 20 years ago moment. However, the best moment you control is right now. I kept this in mind when starting with BookieBacker about a year ago. I did it because I wanted to do it and it was something I held off for far too long.

And mind you, having a house and a good job, this is just a project for fun and for me. BookieBacker is not a get rich quick scheme but an initiative to share my work and let other people get some return out of it. I get a kick from people asking me for alternative data sets and I love it when people want to share their own projects with me. Or people congratulating me on good results. Maybe even from people who are upset with bad results, because they expected better.

In the end, if BookieBacker makes a profit (and believe me it’s hardly self-sustainable right now), some of it is going to charity, because I just want to prove that I can do this. And I want to help others who have similar projects and interests. Profits have a better purpose, to help those less fortunate.

Evolutionary computing in action

As previously explained; I am a pretty big fan of evolutionary computing. Meaning that in my algorithms I like to use nature as an example. Nature has some pretty sweet ways of finding a (local) optimal solution in an extremely big solution space. For BookieBacker I use an adapted version of the genetic algorithm. Before you can use this method, you need to do two things. The first is translating your problem into a representation of DNA. The second one is defining a cost function. This cost function should be able to evaluate your string of DNA and tell you how good it is. For example, a string of DNA might be the composition of a random animal (wings, claws, beak, teeth, fur, number of eyes etc) and the cost function should indicate if this random animal can survive in the outside world (can it find food, can it take on predators etc). So for animals (and humans) our attributes are the genetic algorithm solution, the big scary world outside is our train/test area and whether we survive (or reproduce) is our cost function that we need to optimize.

That’s a great story, but the question remains; how do I translate football data into DNA? And what is my cost function? What if we translate previous results into attributes. For instance; check out the contents of the full dataset right here. Let’s say we chose a random set of five of these attributes;
1. Number of home wins for the home team in the previous 3 matches. (X1)
2. Difference in goals scored between the teams in the previous two matches. (X2)
3. The result of the last four confrontations between these two teams. (X3)
4. The number of clean sheets for the away team in the past 3 matches. (X4)
5. The number of losses with 2 goals or more for the away team in the past 3 matches. (X5)
Great, nice selection right?

We will use these variable to see if dividing all the matches that we have with these attributes in two groups can land us a profit. The two groups of course being; ‘I will bet on this’ and ‘I will not bet on this’. But the values of this set of attributes are not interpreted or weighted yet, they are just a set of incomparable values without a direction towards betting yes or no. So we need something to make them tick. Referring back to DNA, I don’t just want a location for my ‘claws’ attribute, I want to know how BIG my claws are going to be.

So we also need a set of random coefficients, let’s say that we draw the set [-10,5,10,-5,3] that means that we can calculate our ‘random’ total for this and other matches as;

X1*-10 + X2*5 + X3*10 + X4*-5 +X5*3

The result of this formula on all matches, re-scaled to numbers between 0 and 1, can represent the chance for, for instance, a home win. Betting on those matches where the % chance of a home win gives you and advantage over the bookie will probably not result in a profit for this first set of attributes and coefficients. So what are we going to be doing about that?

This is where the evolution comes in place. If you see your set of X1-X5 as a string of DNA and thus an animal, imagine making 50 animals. So 50 sets of X1-X5 (all different indicators) and 50 sets of coefficients. If you evaluate the result of all these 50 sets you can sort them from most profitable to least profitable. Just like in nature, not every animal will survive.

The genetic algorithm is an iterative approach. The 50 animals that we take into account in our next iteration are;
– The top ten performers of the previous iteration.
– 10 Mutations of the top ten performers of the previous iteration.
– 10 Children of the top ten performers of the previous iteration.
– 20 brand new sets.

Mutations? Children? What kind of sick and twisted algorithm is this? Hear me out. If a solution performs well, you obviously did something right. You put emphasis on the correct variable or set of variables. It is, however, not certain that this solution is optimal.
First mutation. If we want to run a marathon, we need to practice. We need to break the 20km, 25km, 30km and 35km barrier before we can attempt the 42km. That is your own mutation. Practice, stronger legs, bigger lung capacity.
To mutate the top 10 performers, we can chose to make the variables (not the coefficients) switch places. So X1 switches places with X2 and is now multiplied by (in the example) 5. You keep the same attributes but look for a better weighting solution.
Secondly, children of top performers. If we imagine the African Savanna a long time ago, some horse-like animals were having trouble reaching high enough places in the trees to eat leafs. Out of a group of those animals, which one do you think got the most ass? That’s right, the one with the longest neck. His children had a longer than usual neck and that made them more desirable as well. Repeat for a 1000 times and there you have it, the giraffe!
To create children from your top ten performers, pick 2 performers and make two new sets using the X1-X3 of your first solution with X4-X5 of you second and the other way around.
A third possibility is that you were not at all on the right track to create an animal that survives. We might be trying to select the optimal set of claws to survive in the ocean. Claws are always nice but an animal that can breath underwater is the clear winner in every ocean battle. Maybe Will Ferrell can explain this better. In conclusion, we will also take another set of 20 brand new animals with us in the equation.

Next time I’ll write they’ll be more details and a link to GitHub code. Hopefully you can use it yourselves. Any questions? or

November Update

So November was a bad month. It started with a horrible weekend in which almost half the stack was lost. This weekend was full of match reviews about rare losses, bad luck for one team, bad calls from the referee and pandemonium all around. It was just one of those weekends where everything went wrong. Offcourse, there might have been weekends that went the other way around but those are always so much more forgettable, right?

Anyway, the rest of the month we did some recovery, but not enough. The total result was a -3.4% loss. It’s a shame. The best result was Colon Santa FE’s home win over Arsenal Sarandi in the Argentina Premier Division for 4.2. Anderlecht not beating Oostende at home was the heftiest loss at a 1.5 rating.

So what are we currently doing to make the result better? Simple. Now that we are running a LOT of leagues at the same time, We’re generating around 60 tips every weekend. We have to make a selection, because 60 is too much, there is no overview. That does give us the luxury to be more picky in the tips that our algoritm provides. So what combinations of Home/Away team tips, combined with Odds and Weight of the tip (severity of the overestimation of the risk) work best? There is some tweaking to be done there, but currently we are generating so much data that it shouldn’t be a problem.

October update

Sooo….. October had some ups and downs. We did end on top of things though, around 6.5% profit on average. We delivered 142 bets, of which 78 were won. The best result was Karlsruhe’s away win at Wurzburger Kickers for an odd of 4. The saddest loss was Manchester City’s inability to win their home game to Southamption on the 23rd.

See past results for the complete list.

Some updates going into November;
– Japan an Brazil are retiring for this season.
– A buttload of European competitions will replace them (Germany, Spain, England, Scotland, Belgium, France, Italy, Russia, Greece, Portugal, Netherlands and so on)
– Argentina will also re-appear for a LONG season so lots of action there.
– We are going to crank up the betting-treshold, so less bets, higher expectancy.

Just wanted to update you guys!

The perfect validation period for betting

Some time ago, around April this year, I had one of those shower epiphanies. One where you quickly dry yourself off while wanting nothing more than getting out of the room to write it down.
Listen to this; What if the results of your model (your validation) are as close to the actual timeline as possible? Allow me to elaborate. Normally you build a model based on a set of hundreds or thousands of matches. You validate your model on a set (preferably more recent games) and then you start predicting. My validation phase has always been about 4 previous months. So in order to predict matches in May, I validated if my model did well in the period January-April. But is that a correct time frame?
A full year of competitive football consists of several phases. There are the first few months, where new players are either overperforming or have to find their way. There’s a middle part, where International Leagues become more important (group stages) and the potential winners and losers of the season start getting clearer and clearer. Then there’s the end of the season, win the championship, don’t relegate, and how you do it, doesn’t matter. To use a Dutch saying; ‘A cornered cat will do anything to be free’. Meaning, don’t make a prediction then.
So why not validate a model that only does well on the month or few weeks before this weekend? To fully capture the different phases of the season. I decided to make a model based on a longer validation period and a model based on a very short one, for comparison. So what I did was, every week before a series of matches, I made sure that the first model performed well on the month before (short term) and the second one performed well on the four months before (long term). Here’s what I got.

You know what the fun part is? I published the short term ones because I was so certain I was right. Sorry about that. I’ve picked up the pace again and the last few weeks have been a lot better! Lesson learned, use a model that consistently performs over several months before you publish. A few weeks is not good enough.

Evolutionary Computation; Useful in Sports Betting?


I’ve always been a big fan of Evolutionary Computation. Examples of EC are the Evolutionary Algoritm, the Ant Colony Optimization, Bee Colony Optimization and many more. All examples of algoritms that have their origin in nature. Especially for the enthousiastic young data scientist a great way to get people acquainted with your profession. You can spend a lot of time trying to explain the hidden layer in a Neural Network, or you can explain how an ant finds food and how you can apply that method to other problems. Or translate the origin of species to prediction of bankruptcy and predicting which country will win medals at the Olympics. At parties, you can keep people captivated for hours straight with these stories. Untill you are left only with the cute blond girl listening to your stories when people ask you to leave because she’s to polite to say it but clearly terrified of you with your wild stories of animals killing their inferiors in nature and how that applies to stuff she knows even less about. Still, you have the sexiest job of the 21st century (click) so in time they’ll have to like your stories…..

Enough about my youth, back to the algoritms. As you could read in my previous blog, unsupervised learning is a potentially great way of approaching sports betting. Unsupervised learning (and thus most of Evolutionary Computation) are methods you choose when you think that your particular problem does not fit in a supervised method (eg. the perfect fit towards the target variable).

And that is, in my fair opinion, the way to approach sports betting. You don’t want a great prediction of the match outcome, you want to identify the winners. And to identify those winners, your approach must be original (because you are not alone in your efforts). Let’s look at the Evolutionary Algorithm and Sports Betting. When using an Evolutionary Algorithm, you need to translate a possible solution to your problem to a string of DNA, and iteratively adapt this string of DNA to become a great solution to your problem. For instance, you need to deliver 20 packages all over the city and want to find out what the shortest route is. A possible solution to this transportation problem is making a string of DNA that gives the order of locations visited. You can evaluate this order of locations to a cost by summarizing the distances between these points. Make ten, or however much you want, solutions like this and iterate towards a better solution.
How do you do this? Easy, just like nature did it when creating the perfect animal (mice, dolphins and humans). First, you look into your best solutions. Having found the best ones, make them breed. So the first half of the best solution together with the second half of the second best solution. Also, mix them around. Switch a few places in your DNA of locations in your best solution. We call these the evolved solutions. Put all of these (the best, the breeded, the evolved and a few new ones) in a big bowl and look for the best solution again. How does your solution (animal) look after n iterations?

Ok, and now sports betting. Supposed you have a set of indicators like the ones presented in my free data sets. These indicators all have a certain connection towards the match outcome or the opportunity of having found a worthwhile bet. But how? Ponder about that for a while, as I write a new article about my particular model.

Data science and sports; Asking the right question

So you are into sports and data analysis? And you want to take on the bookmaker? It’s do-able, but not easy. A lot of people have tried, and a lot of people have failed. Yours truly, although working on the subject for a long time, isn’t even sure he can do it on the long term. But I’m putting my results out there, so there’s no way back and I need to put my experience to the test.

If, right now, you are at the point where you have collected a data set (such as the sets here) and want to throw in your most creative selection of machine learning algorithms. In my experience, all the random forests, support vector machines, neural networks and boosting algoritms in the world cannot EASILY (but with some extra work) crack this problem. Especially if your target is the chance of a home win, away win or draw. It’s just too simple, thousants of others do it, and the bookmaker does it.

One of the most important lessons I learned over the years is; are you asking the right question? Let me elaborate on this. The first instinct that an aspiring analytics based tipster has, is; I need to predict the outcome of matches in terms of home/away win or draw, only BETTER than the bookmaker.

As an example, the past Champions League Final between Real Madrid and Atletico Madrid. Before the match you could get (give or take) 2.4x your stack for a Real win, 3x for a draw and 3.3x your stack for an Atletico win. Let’s reverse that into a percentage. 1/2.4 + 1/3 + 1/3.3 = 105.2%. So there is obviously a margin of error for the bookmaker. You need to predict so well that you overcome this margin of error. Which is hard, because your predictions also have some sort of variance, see my previous blog on variance. You need to overcome the margin of error of the bookmaker by so much that you are sure you have the advantage over the bookmaker with your prediction. The bookmaker, however, is not an idiot so his prediction is often pretty good as well.

In all fairness, one thing is still in the advantage of the tipster. The bookmaker has to take the market into account. If everybody and their mother started backing Real Madrid in the previous example, those odds will have to go down and Atletico odds will have to go up. If you are less emotional about the game than inhabitants of Madrid, and more factual, you would see the value bet for Atletico rising and take it.

All in all, it’s tough to achieve this advantage over the bookmaker. If a team has a much higher chance of winning than the bookmaker thinks, the bookmaker will quickly correct. What I’m getting to when saying ‘asking the right questions’ is;

Do I care about my predicted chances?

Bear with me here. Your objective is to get a profit from the bookmaker. So you need to pinpoint the matches that, on average, give a high return. Should you care how delicately you predicted the outcome of these games? Or the precision with which you dismissed other matches, where you cannot get the upperhand, from your results?
Hells no!
You want your model to tell you; this match yes, that match no. Whether the reasoning behind that choice is statisically sound calculations or it’s what the weird guy in the grocery store was shouting at a pack of rice, you shouldn’t care. Well, maybe in this particular example you should care, but that’s not the point.

I’ll leave the details for a next blog, but this insight made me switch from supervised to unsupervised modelling. Instead of ‘find the most common connection between variables and the outcome of the match’ it became ‘find the profitable betting opportunities and tell me what I could have used to select these and not the other options’. One focussus on the common connection, the other on your own special secret connection between historic results and future results.

Your insights aren’t sacred, beware of the variance

Ok let’s get one thing straight, neither are ours. When predicting sports outcomes, most of the data scientists we know forget to take one thing into account, the variance of their prediction, and for that matter, the bookie prediction.

Let’s assume the following, there is an upcoming fixture where team A meets team B at home. Let’s assume that the bookmaker gives you a 2.0 rating for a home win for team A. Ignoring the advantage of the bookmaker, that is a 50% chance of winning. You predict a 60% chance of winning with your nifty classification model. Here’s what usually happens next;


The bookie prediction (blue) is lower than yours (green) and the gap is your expected profit. You make these bets but find out that, if any profit, it’s not a long term 10%. The reason; there is uncertainty in both predictions, yours and the bookmakers. And most of that uncertainty is in your disadvantage.

Here’s what should have happened;


A distribution where the bookie prediction and yours are the mean at 50% and 60% is more likely to be a representation of the truth. The bookmaker has modeled some outcome which is most likely 50%, but could be higher or lower, depending on the strength of the model. As did you, having the most likely result of 60%, but I could be higher or lower. There might even be an area in your area of possibilities that this is a bet giving you an expected negative return (orange). Hows about that? That would leave you with an expected return well below 10%, and that is assuming your model only has that much variance, neglecting injuries, the weather, data errors etcetera.

This is what you should be aiming for to keep the green area as large as possible;
Step one, it’s a sitter, is a better model. Aim for a high R2, along with a stable result between training, validation and test data.

Another tip, save your bets for when your advantage is large enough. Use variance and keep the certainty of making a good bet as high as possible. It is the number two reason it took us so long to create a profitable model. Number one is coming up soon!

Weekly update

Hi All,

In terms of bets and results, what a shitty weekend and midweek. A -25% over the weekend and a small loss (-15% over three matches) in the midweek. But, our results are all about the long term and the weekend is once again very near. What have we accomplished in the past few days;

  • We are almost ready to launch our Argentina and Brazil models, so that we can provide tips during the summer and during the European Championship.
  • We have created a solution for the France Ligue 1 and the lower Scottish Leagues. Using these solutions we can provide more tips, more profit and less variance.
  • We have been working on a weighted betting system, this increases profit by shifting the majority of your betting stack to the most profitable games. This information will appear in coming results. A blogpost on weighted betting is soon to arrive.

Here’s to the weekend!