The Math Behind Premier League Score Prediction

I recently developed an Excel spread sheet that simulates the Premier League table.  This is quite an interesting process.  It took me some time to figure out the math behind score prediction.  The overall predicting steps are well documented by other sources.  However, most of the details, especially detailed algorithms, are not widely discussed or documented.  I have to come up with my own methods or algorithms.   I will document these details here.  Hopefully, it may be useful for me or others in the future.

Before you proceed, you may want to download the spread sheet from here.  Most of the terms and concepts discussed below are used in the spread sheet.  The spread sheet is written with Excel VBA.  It runs in Microsoft Excel 2016 or newer.

Introduction to SPI (Soccer Power Index)

The first step toward predicting the Premier League is to set up the Soccer Power Index (SPI) for each team in the league.  Luckily,  the folks at fivethirtyeight.com  have already done it.  They post updated SPI values for each team in the Premier League on their website.

SPI is a pair of numbers that measures a team’s offense and defense strength.   The attacking index is the expected number of goals a team scores over an average team in the league.  The defending index is the expected number of goals a team concedes to an average team in the league.  Detailed explanation of SPI and high level score estimation procedure is documented here.   

Adjust League Average Based SPI to SPI Against Opposing Team

The next step is to convert SPI to the number of goals a team is expected to score or concede over its opponent which may be stronger or weaker than an average team in the league.  I used the following formulas:

offhome_adj = Offhome + (Defaway - Def_ave)
defhome_adj = Defhome + (Offaway - Off_ave)

For example, the first game in the 2018/19 season is Manchester United vs Leicester City.  The SPI of Man U is 2.2(attacking), 0.5(defending).  The SPI of Leiceter City is 1.9 (attacking), 0.8 (defending).  The average SPI of all teams in the Premier League is 1.97 (attacking), 0.75 (defending). 

To calculate the attacking index of Man U (home team) against Leicester (away team), we take the difference between the defending index of Leicester City and League average, which is the extra goals that Leicester City may give up against the average team in the league, and add it to the Attacking Index of Man U.  This assumes that Man U can capitalize on the goals that Leicester’s defense is likely to give up.  In this case, we get

offhome_adj= 2.2+(0.8-0.75)=2.25

We calculate the Defending Index for Man U. in the similar way. 

defhome_adj= 0.5+(1.9-1.97)=0.43

This strengthens Man U’s defending index, making them conceding fewer goals, since Leicester’s attacking index is slightly below League average. 

Adjusted SPI for Leicester City (offway, defaway)=(1.65,1.03) is calculated in the same way.

Obtain Projected Score of a Match

Once we have the adjusted SPI values, we can go on to find out the projected score of a particular match.  For this I simply average the adjusted attacking index of the home team and the adjusted defending index of the away team to get the projected number of goals scored by the home team (spiGF).  The number goals the away team is expected to score (spiGA) is calculated in the same way.

spiGF = (offhome_adj + defaway_adj) / 2

spiGA = (defhome_adj + offaway_adj) / 2

Again, take Man U vs Leicester City as an example, the projected score is 

spiGF = (2.25 + 1.03) / 2 = 1.64

spiGA = (0.43 + 1.65) / 2 = 1.04

Scale to The Number of Average League Goals

In order to generate more realistic scores, after we have calculated projected GF goals and GA golas for all teams, we scale the number of projected goals to the number of average league goals.  For this simulation, we assumed that number of league average goal is 2.85 per game.

Consider Other Factors: Home Field Advantage.  Days of Rest before a Game. 

For home field advantage, I assume the home team can score an extra 10% more goals.

I also assume that a team will give up an extra goal if they have zero rest days before a game and a team will not give up any extra goals, if they receive full 7 day rest.

In the case of Man U. vs Leicester City, projected after adjusting for home field advantage is 1.80:1.04.

Ability to adjust SPI – SPI Adjustment Factor (SAF)

Well, what if you do not agree with SPI values from fivethirtyeight.com? While you can always change the SPI values in the spreadsheets, SPI Adjustment Factor (SAF) provides a easy way to change or override the SPI values.

SAF is a whole number between -15 to +15.  SAF=0 will not alter the SPI values or the projected game score.  If you set SAF to a number larger than 10, you literally can boost a team from bottom of the table to top of the table.  If you set SAF to a number less than -10, you can sink a top team to the bottom of the table.  A moderate value between -5 to +5 is recommended.

Poisson Distribution.

The projected game score is the most likely outcome of the game.  But in reality, anything can happen.  We must calculate the probability of every possible score.  It turns out Poisson Distribution closely describes the goals are scored in a soccer match.   

P = (Lamda ^ k * Exp(-Lamda)) / Factorial(k)

Lamdais the number of projected goals of the match.  P is the probability of having k number of goals in that match.

To Poisson Distribution to be valid for a soccer match, the match must have the following characteristics:

  • k is the number of goals that can be scored.  k =0, 1, 2, …
  • The occurrence of a goal does not affect the probability that a second goal is scored. That is, goals occur independently.
  • The rate at which goals are scored is constant. The rate cannot be higher in some intervals and lower in other intervals.  
  • Two goals cannot be scored at exactly the same instant.
  • The actual probability distribution is given by a binomial distribution and the number of trials on goal is sufficiently bigger than the number of goals scored.  

I would not say that the above is a perfect description of a soccer match, but I guess it is close enough.

Below is table of goal probability of the Man U. vs Leicester City match.

Poisson Distribution
Goal Probability Example

As an example, the very top row in red shows that Man U has 29.7% chance scoring one goal and 26.8% chance scoring two goals.  The very left column in blue shows that Leicester has 36.8% chance of scoring one goal.  Combined, there’s a 10.9% chance having a 1:1 draw, 10.5% chance having a 1:0 score and 9.8% of chance of having a 2:1 score. 

Incidentally, 2:1 is the actual score of the match.

Also from the table, all the green cells add up to the probability (55.0%) of Man U. winning the game.  All the orange cells add up to the probability (11.7%) of Leicester City winning the game.  All the yellow cells add up to the probability (23.1%) of a draw.

Time to Throw the Dice

So what’s the final guess of the score?  It’s time to throw the dice.  At this point, the program generates two random numbers, one for Man U. and one for Leicester.  Depending on the value of the random numbers, we guess the final score.  Although 1:1 is the most likely score, there’s only 10.9% of the time we can get that score.  

Auto Fill Feature

Let’s go back to the Excel program.  The “Auto Fill” function executes the above sequences for all fixtures.  It runs through the entire season once and generates the “guessed scores” of every game.  The League table in the “Table” tab is updated using the “guessed scores”. Keep in mind, this is the result from the throw of a dice, the “guessed score” may not always be the “projected score”.

A few things about Auto Fill:

  1. It will only update the empty cells in the score columns
  2. It will not erase the scores that are manually entered.

Adjust SPI Based on Game Results

To make the simulations more realistic, as Auto Fill runs through the fixtures in the season, the SPI values of each team is updated based on the guessed results.  For instance, if a weak team somehow beat a strong team, the SPI of the weak team is raised.  The SPI of the strong team may be lowered.

In the program, we only updates the SPI, if there is a difference of 1.7 goals between the “projected” goal and the “actual” or “guessed” goal.  For every 1.7 goal difference, we boost the SPI value by 0.1.  The offensive SPI caps at 3.2 and bottoms at 0.5.  The defensive SPI bottoms at 0.1 and caps at 2.

Monte Carlo

To predict the team standings in the table, we need to run “Monte Carlo simulation”.  We basically run the “Auto Fill” process many times, say 10,000 times.  We then calculate the average points and average goal difference of the teams from the 10,000 tables.  The teams are then ranked based on the average points and average GD.

May your favorite team win!

Leave a comment

Your email address will not be published. Required fields are marked *