How the Poisson Distribution Predicts Football Scores
If you've ever wondered how prediction models generate those "47% chance of a home win" numbers, you're in the right place. The secret ingredient behind most football score prediction is something called the Poisson distribution — a bit of maths from the 1830s that turns out to be almost perfectly suited to the way goals happen in football.
Don't worry. You don't need to be a mathematician. If you can follow the logic of "goals are rare events that happen somewhat randomly," you've already grasped the core idea.
What Is the Poisson Distribution?
The Poisson distribution is a way of working out how likely it is that a certain number of events will happen in a fixed period, when those events occur at a known average rate and roughly independently of each other.
That's a mouthful, so let's translate it to football: if we know a team averages 1.5 goals per game, the Poisson distribution tells us the probability of them scoring 0 goals, 1 goal, 2 goals, 3 goals, and so on in any given match.
The key requirements are:
- The events are relatively rare — goals don't happen every minute, they're spread thinly across 90 minutes
- Each event is roughly independent — one goal doesn't drastically change the probability of the next (this is a simplification, but it holds reasonably well)
- You know the average rate — this is where xG comes in
Football fits these criteria surprisingly well. Goals are rare (the average Premier League match produces around 2.7 total goals), they're somewhat independent (aside from psychological effects like a team pushing harder when behind), and we can estimate the average rate using xG data.
The Formula (Don't Panic)
The Poisson formula looks like this:
P(k goals) = (e^-λ × λ^k) / k!
Where:
- λ (lambda) = the expected average number of goals (this is our xG figure)
- k = the specific number of goals we're calculating the probability for
- e = Euler's number (approximately 2.718)
- k! = k factorial (e.g., 3! = 3 × 2 × 1 = 6)
You absolutely do not need to memorise this. There are calculators everywhere online, and any spreadsheet can handle it with a simple function. The important thing is understanding what it does: it takes an expected goal rate and gives you the probability of each possible goal tally.
A Worked Example
Let's make this concrete. Say Arsenal are playing at home against Aston Villa. Based on recent xG data, we estimate:
- Arsenal expected goals: 1.8
- Aston Villa expected goals: 1.0
These numbers might come from averaging recent home xG for Arsenal and away xG for Villa, adjusted for the strength of opponents faced. The exact method varies between models, but the principle is the same — you need an expected goals figure for each team.
Arsenal's goal probabilities
Using the Poisson formula with λ = 1.8:
| Goals | Probability |
|---|---|
| 0 | 16.5% |
| 1 | 29.8% |
| 2 | 26.8% |
| 3 | 16.1% |
| 4 | 7.2% |
| 5+ | 3.6% |
So Arsenal are most likely to score exactly 1 goal (29.8%), closely followed by 2 goals (26.8%). There's a 16.5% chance they blank entirely, and about an 11% chance they score 4 or more.
Aston Villa's goal probabilities
Using the Poisson formula with λ = 1.0:
| Goals | Probability |
|---|---|
| 0 | 36.8% |
| 1 | 36.8% |
| 2 | 18.4% |
| 3 | 6.1% |
| 4 | 1.5% |
| 5+ | 0.4% |
Villa are equally likely to score 0 or 1 (36.8% each), with a reasonable 18.4% chance of scoring twice. Three or more is unlikely but not impossible.
Notice how even though Arsenal's expected goals is "only" 0.8 higher than Villa's, their probability profiles look quite different. Arsenal have a much lower chance of scoring zero and a much higher chance of scoring three or more. That 0.8 xG gap translates to a significant advantage when you play out the probabilities.
Building the Score Matrix
Here's where it gets really powerful. If we assume the two teams' goal-scoring is independent (a simplification, but a useful one), we can multiply the probabilities together to get the probability of every possible scoreline.
For example:
- Arsenal 2-1 Villa = 26.8% × 36.8% = 9.9%
- Arsenal 1-0 Villa = 29.8% × 36.8% = 11.0%
- Arsenal 0-0 Villa = 16.5% × 36.8% = 6.1%
- Villa 1-0 Arsenal = 16.5% × 36.8% = 6.1%
- Arsenal 1-1 Villa = 29.8% × 36.8% = 11.0%
Here's the full matrix for the most common scorelines:
| Villa 0 | Villa 1 | Villa 2 | Villa 3 | |
|---|---|---|---|---|
| Arsenal 0 | 6.1% | 6.1% | 3.0% | 1.0% |
| Arsenal 1 | 11.0% | 11.0% | 5.5% | 1.8% |
| Arsenal 2 | 9.9% | 9.9% | 4.9% | 1.6% |
| Arsenal 3 | 5.9% | 5.9% | 3.0% | 1.0% |
| Arsenal 4 | 2.7% | 2.7% | 1.3% | 0.4% |
The most likely single scoreline is Arsenal 1-0 or Arsenal 1-1 (both at 11.0%). But the most likely scoreline isn't necessarily the best prediction — what matters is the overall probability of each outcome.
Summing Up: Match Outcome Probabilities
By adding up all the cells where Arsenal score more than Villa, we get the home win probability. All the cells where they score the same give us the draw. And all the cells where Villa score more give us the away win.
From our example:
- Arsenal win: approximately 51%
- Draw: approximately 23%
- Aston Villa win: approximately 26%
Arsenal are favourites, which makes sense given their higher expected goals. But there's still roughly a one-in-four chance Villa win — that's football. Even with a clear underlying advantage, the team with lower xG wins plenty of individual matches.
Why This Matters for Predictions
The beauty of the Poisson approach is that it gives you a complete probability distribution for every possible score, not just a single predicted scoreline. This is much more useful because:
It captures uncertainty honestly
Football is inherently uncertain. A model that just says "Arsenal 2-1" is pretending to know something it doesn't. A model that says "Arsenal win 51%, draw 23%, Villa win 26%" is giving you the full picture. You can make decisions based on the actual probabilities rather than a false sense of certainty.
It lets you calculate over/under probabilities
Want to know the probability of over 2.5 goals? Just add up all the cells where the total is 3 or more. In our example, that comes to about 47%. Under 2.5? About 53%. The same matrix gives you both teams to score probabilities, exact score probabilities, and any other market you can think of.
It reveals where the value is
If a bookmaker is offering odds that imply Arsenal have a 60% chance of winning, but your model says 51%, the data suggests the away win or draw might be undervalued. We'll explore this concept more in our piece on value analysis, but the Poisson model is the engine that makes it possible.
The Limitations
No model is perfect, and Poisson has some known weaknesses that are worth understanding:
The independence assumption
Poisson assumes each team's goals are independent of the other's. In reality, there's some correlation — if one team scores early, the other might push forward and either score themselves or concede again. Studies have shown there's a slight positive correlation between the two teams' goal tallies. Some advanced models adjust for this with a "correlation factor," but basic Poisson ignores it.
It doesn't handle game state well
A team that goes 2-0 up might sit back and defend, reducing both teams' expected goals for the remainder of the match. A team that goes 1-0 down might push forward and create more chances. Poisson treats the match as a single block with fixed expected goals throughout, which is a simplification.
Goals aren't perfectly independent events
There's some evidence of "momentum" in football — goals tend to cluster slightly more than a pure Poisson distribution would predict. A team that scores is slightly more likely to score again in the next few minutes, possibly due to psychological effects or the opposition being disrupted.
It's only as good as the input
If your xG estimates for each team are wrong, your Poisson output will be wrong too. The model itself is sound, but the garbage-in-garbage-out principle applies. This is why the quality of the xG model feeding into the Poisson calculation matters enormously.
Beyond Basic Poisson
Serious prediction models often build on the Poisson foundation with additional refinements:
- Dixon-Coles model — adjusts for the underestimation of 0-0 and 1-1 draws that basic Poisson produces, and adds a correlation factor between the two teams' goals
- Bivariate Poisson — explicitly models the correlation between home and away goals
- Dynamic models — update the expected goals estimates in real-time based on in-game events (useful for live prediction)
- Negative binomial distribution — an alternative to Poisson that allows for more variance (overdispersion) in goal counts
But the basic Poisson model remains the starting point for almost all of them. Understand Poisson and you understand the foundation of modern football prediction.
Try It Yourself
If you want to have a go, it's genuinely easy:
- Pick two teams playing this weekend
- Look up their average xG per game over the last ten matches (Understat or FBref are good free sources — we've got a guide to free data sources if you need it)
- Use a Poisson calculator online, or use the formula =POISSON.DIST(k, lambda, FALSE) in Excel/Google Sheets
- Build the score matrix
- Compare your probabilities to the bookmakers' odds
You'll be surprised how often the basic model produces sensible outputs, and how occasionally it spots things the market has mispriced. It's also a brilliant way to train your football brain to think in probabilities rather than certainties — which is how the best analysts think about the sport.
Conclusion
The Poisson distribution might sound intimidating, but its application to football is beautifully simple: take an expected goals figure for each team, calculate the probability of each goal tally, combine them into a score matrix, and you've got a principled, data-driven prediction for any match.
It's not magic, and it won't get every game right — nothing will. But it's a rigorous framework that turns xG data into actionable probabilities, and it's the same mathematical backbone used by professional analysts, data companies, and sophisticated prediction models worldwide.
Next time you see a prediction that says "Home win 55%," you'll know there's probably a Poisson distribution working away behind the scenes.