NFL Statistical Modeling for Sports Bettors
Handicapping is hard. You can use dozens of tricks and simple strategies to try to improve, and many of them are helpful. But if you want to figure out how to win consistently you have to put in the work. One way that some sports bettors try to reduce their long term work requirements is by building statistical models.
A sports betting statistical model basically takes information from past performances of players and teams and runs the information through an equation. This is usually done with the help of computers and programs, like a spreadsheet.
The upside to statistical modeling is if you can build a profitable model, you put a great deal of work into it up front, and then benefit from this work for a long time. You can benefit from the upfront work for as long as the model continues working.
On the other hand, sports betting is about more than statistics. Statistical models have a hard time factoring in important variables that aren’t easily translated to numbers. The two most common issues that come to mind that statistical models struggle with are weather and injuries.
You might think that an injury, especially one that keeps a player from playing, is easy to account for in a statistical model.
But it’s not, and here’s why.
The starting running back is injured and won’t play in the upcoming game for one of the teams in a game you’re handicapping. The first instinct of many bettors is to simply remove his expected contribution from the statistical model.
But he’s being replaced by a backup running back, or a combination of two or three other running backs. You might have some limited statistical information for the replacement players, but it’s almost impossible to predict exactly how good or bad the replacement or replacements will perform as the starter.
It’s easy to predict that the running back position won’t perform as well without the starter, but determining an accurate drop in production in actual numbers is basically impossible.
You also have to consider how the coaching game plan will change because of the injury. A team that has one of the best running backs in the league is likely to change the game plan when he isn’t playing. The reason he’s the starter is because he’s better than his replacement.
The reason I’m pointing out a few of the issues with statistical modeling isn’t to try to get you to avoid statistical modeling in the NFL. On the contrary, I believe that statistical modeling is a powerful way to handicap games in every sport.
The point that I’m trying to make sure you understand is that even with the best statistical model you have to spend some time looking at other things in every game you’re evaluating. Everyone wants to come up with an NFL betting system where they can simply plug a bunch of numbers into an equation and it spits out winning results. But this is an unrealistic goal.
The good news is that if you construct a good statistical model for NFL games you can use it over and over again and reduce the amount of time you need to spend handicapping each game. Just don’t make the mistake of thinking you can come up with a statistical model that covers everything.
The basic things you need to start building an NFL statistical model are access to as many statistics as you can find, a computer, a spreadsheet program, and the ability to either develop a fairly simple computer program or the ability to create equations in your spreadsheet program.
I use Microsoft Excel, but any decent spreadsheet program will work. Once you learn a few simple rules, programming equations in a spreadsheet is fairly simple. I strongly recommend buying a book about your spreadsheet of choice, or watching some videos to learn how to make your spreadsheet do automatic calculations.
Once you learn how to use the spreadsheet to do what you want, it’s easy to test different variables. You can set up a different spreadsheet for testing new variables and copy the formulas you already have set up.
The great news about the stats you need to use is that you have access to more data than you can possibly handle. Instead of being forced to find newspapers with stats like the old timers had to do, you can find just about any NFL stat you can imagine online.
Here are a few places you can find statistics for your NFL betting model.
These are just a few. Once you start building your model, you’ll start figuring out the best sources for the exact information you need. As you find new sources, add them to a section on your spreadsheet so you don’t have to look for them every time you need something.
The other important thing you need is a system for backing up your spreadsheet on a regular basis. This might not seem important, and most people don’t back up things on their computer very often. But you’re going to invest a great deal of time and energy in your statistical model, and if you lose it because you don’t back it up it can be crushing.
One of the easiest ways to back up your spreadsheet is to email it to yourself at least once a week. It’s even better if you send it to yourself every day that you make a change to it. You can also use a flash drive that plugs into your USB port or use an external hard drive.
Now that you know everything you need to get started, you need to understand some of the additional challenges you need to consider when developing an NFL statistical betting model.
In the opening section I mentioned some of the things that a statistical model doesn’t do well. But these aren’t the only challenges you face.
The NFL season is only 16 games long, plus the playoffs. The preseason games aren’t very helpful in statistical modeling, so I usually ignore them other than for injury information. The statistics from the past season also aren’t extremely helpful, because the players on teams change, players get older, and coaching staffs change.
You can use back testing on your models using past seasons, and you should, but it’s dangerous to use results from the past season to predict upcoming games.
This means that the first week of games in the NFL season are difficult to handicap. I never try to handicap games the first four weeks of the NFL season using a statistical betting model. I still bet some games the first four weeks, but I use different handicapping methods on them.
If you don’t use your statistical model the first quarter of the season, it reduces the number of games you can bet on. On the other hand, when the playoffs come around you have an entire season worth of statistics to use in your model.
Another challenge is that sportsbooks also use models to help set lines. The kind of model I suggest you use is more advanced than what most sportsbooks use, but the way you make money is by finding lines that offer value. When the sportsbooks set lines that are accurate, you don’t make money betting.
The sportsbooks are better than ever at using all of the information they can get to set tight lines. This doesn’t mean that you can’t develop a profitable model, but it does mean that you’re going to have a hard time betting on too many games.
You need to use your model to identify a few games every week where the lines offer value. If you develop a statistical model that shows supposed value in more than three or four games in a week compared to the offered lines it doesn’t mean you suddenly have a great model. It means that you have a terrible model.
This might not make sense at first, but sportsbooks are profitable. They’re profitable because they set good lines. In any given week, most of the lines set by the sportsbooks are close to predicting the final score differential. This means that a good statistical model should come close to predicting the same final score differential on most games.
The place you find value is the sportsbooks are more interested in creating equal betting action on each side of a game than actually predicting the final scoring differential. The books might adjust a line predicted by their model a few points one way or another because they know the betting tendencies of the public.
These are the games where you can find betting value with a strong model. This is why your model is probably broken if it shows a potential profit on several lines in any given week.
You also need to develop a system to determine how much to bet on each game that shows a possible value. Most bettor start by making the same size wager on every game they bet, and I recommend this system. But as you refine your model you should start seeing more value in some lines than others.
The New York Giants are playing at the Dallas Cowboys. The line the sportsbook offers is the Cowboys favored at -8. Your statistical model shows that the Cowboys should win by 10. This is enough difference to bet on the game, but it’s still a fairly small difference.
If your statistical model shows that the Cowboys should win by 12, this is a much higher differential. If you’re confident in your model, you’ll bet more when the differential is four than you bet when it’s two.
The four point difference has a better chance of winning, as long as your model is accurate, than the two point difference.
It’s easy to think that since the difference is twice as much that you should bet twice as much. But the problem is that no matter how good your model is, it’s never going to be perfect. The difference of a point isn’t as valuable as you want to believe.
Here’s a Sample Betting Structure:
|Two point difference||Make your standard bet|
|Three point difference||1.25 times your standard bet|
|Four point difference||1.5 times your standard bet|
|Five point + difference||2 times your standard bet|
One thing you need to realize at this point is that you’re rarely going to find games with a difference of more than two or three points.
If you want to be more aggressive and can afford to take the risk, you can bet a higher multiple of your standard bet, but I don’t recommend being too aggressive until you’re 100% sure you have a good model.
An aggressive structure might be doubling your standard bet for every point over two in the difference between the line and your model prediction.
The last challenge you need to be aware of is that even if you build a great model you’re going to lose almost as many games as you win. You only need to win 53% of your bets against the spread to make a profit, and it’s very hard to win more than 55% of your games.
This means that you have to use and test your model for a long time before you know if it’s accurate or just lucky. This also means that if you have a string of poor results it doesn’t always mean that your model is bad. You might just be unlucky in the short term.
This is why you should be running more than one model at a time, and why you should keep running models for years, even if they don’t look like they work. Once you have a model set up, it can keep running with minimum upkeep, so there’s no reason to scrap a model that doesn’t seem to be working.
Keep all your models, and add new ones when you want to change something. Use the power of your computer. As you develop more models, take the best ones and combine them in another new model.
How to Build a Model
Now it’s time to start building your first model. It’s going to be fairly simple, and it’s not going to beat the sportsbooks. But it’s going to show you how to build a model so you can start building better and more complicated models.
In this model, you’re going to use the points scored and allowed for each team at home and on the road to predict the scores for an upcoming game.
- The New England Patriots average 24.7 points scored per game on the road, and give up an average of 17.3 points per game on the road.
- The Denver Broncos are averaging 22.5 points per game scored at home, and are giving up an average of 18.2 points per game at home.
To determine the predicted score for each team when the Patriots play at the Broncos, you average the points scored by each team with the points allowed by the opposing team.
- The Patriots score 24.7 points per game and the Broncos give up 18.2 points per game.
- This is an average of 21.45 points.
- Simply add 24.7 and 18.2, and then divide by two.
The Patriots are predicted to score 21.45 points in the game using this model. Round 21.45 off, and you get 21 points.
- The Broncos score 22.5 points per game and the Patriots give up 17.3.
- This is an average of 19.9 points.
- When you round this, you predict the Broncos are going to score 20 points.
To determine if you should make a bet on the game, you compare your predicted final score against the available line.
You look at the line and see that the Broncos are favored by one. If your model is accurate, it means that you should bet on the Patriots and take the point.
This model is weak, because it only factors in two variables. It’s also so simple that a high school freshman could come up with it without any help. I’m not picking on high school freshman, because I was one myself many years ago. But if a freshman could develop your system without any help your system is probably too weak to beat the sportsbooks.
Your statistical model is going to work on the same basic theory of this model, but it’s going to use many more variables. You still want your model to predict final scores, but you want to use the right mix of variables to make it as accurate as possible.
Why You Can’t Buy a Winning Model
So why can’t I just give you a winning model? Why can’t you buy a winning model?
Every winning statistical model is unique. It has to be unique, because if too many people use the same model it tends to become unprofitable. Sportsbooks adjust based on their profit, so even if they don’t know your exact model, they can still make adjustments to counter it.
People want a quick fix. They throw money at problems because it’s often the easiest way to solve them. But you can’t throw money at a sports betting system or model and turn it into a profit. If you have money to invest in developing an NFL betting model, consider hiring a programmer and/or a mathematician to help you work on your models.
Here’s a list of statistics to consider using for your models. This is far from a complete list, but it gives you a place to start. You might not use all of the statistics below, but as you read through the list you need to start thinking about which ones you want to use and other stats that you might want to find and use that aren’t on the list.
- Points scored per game at home
- Points scored per game on the road
- Points allowed per game at home
- Points allowed per game on the road
- Yards gained at home
- Yards gained on the road
- Yards allowed at home
- Yards allowed on the road
- Yards per point scored at home
- Yards per point scored on the road
- Yards per point allowed at home
- Yards per point allowed at home
- Yards per play on offense at home
- Yards per play on offense on the road
- Yards allowed per play at home
- Yards allowed per play on the road
- Sacks allowed per play at home
- Sacks allowed per play on the road
- Sacks per play by defense at home
- Sacks per play by defense on the road
- Interceptions thrown per play at home
- Interceptions thrown per play on the road
- Interceptions per play by defense at home
- Interceptions per play by defense on the road
- Ratio of pass to run plays at home
- Ratio of pass to run plays on the road
- Average yards per pass at home
- Average yards per pass on the road
- Average yards per run at home
- Average yards per run on the road
This list might seem daunting, but it’s not even close to a complete list of what you should consider using. You also need to figure out how to make adjustments based on the strengths and weaknesses of each team.
What happens when a great run defense faces a great run offense? What happens when a terrible pass defense faces a great pass offense, and how big is the advantage?
It’s easy to understand that a great passing team facing a poor pass defense is going to score a lot of points. But the key is figuring out how many more.
The good news is you can use a statistical model to develop an idea of exactly how much each thing is worth. The more data you collect and analyze, the better you can make your model.
It’s easier to build a statistical NFL model when you just use the team statistics, like the ones in the last section. But if you want to make the best model you can, you need to learn how to incorporate individual statistics into your model.
This is the most valuable when you try to adjust your team based model when players miss games. One way to do this is build a model that compares the replacement value for key positions from past data. Each player and position has a slightly different value, so this isn’t an exact science.
One team has a backup running back that plays quite a bit and has a track record of success.
- If the number one running back misses a game, the backup can replace him with a 90% replacement value.
Another team has a backup running back that doesn’t play much and is clearly a step down from the starter.
- This backup might only be able to provide 50% of the value of the starter.
- He might provide such a low replacement value that the coaches use an entirely different game plan.
Some teams do a better job of interchanging players than others. This is something you need to know as well.
The good news about injuries is the sportsbooks don’t really know any more than you do about replacement value. This means that you have a chance to find value if you can build an accurate replacement value model for the NFL.
Building a successful NFL statistical model is like solving a giant math problem. It’s not easy, but it can be solved if you know enough and put in enough effort. Start with a simple model and add new statistics and variables one at a time. Build a new model with each change so you can track exactly what each change does to the effectiveness of the model.