At a Glance
Model game-winning probability and predicted team scores in real-time based on raw NFL play-by-play data.
For the win probabilities, an XGBoost Classifier was used to predict the outcome of the home team’s victory. The model outputs expected the probabilities of each event, in this case, the probability home team winning or losing. For the final scores, an XGBoost Regressor was used to predict some variation of the final scores. The target variables were dependent on the approach, but they all utilized the same feature set.
Expertise & Technology
In-game win probabilities and score predictions can be used to build a rich environment for consumers of NFL games; they complement in-game analysis streams and drive product differentiation between content providers. Accurate models for win probabilities and expected scores allow customers to view how key events change the likely outcome of the game in real-time, ultimately driving consumer engagement for the platform.
SFL Scientific Solution
The SFL Scientific team built a successful solution that ingests pregame and real-time in-game streaming data to predict both win probabilities and final scores for NFL games.
The solution uses a machine learning pipeline that transforms raw feed data into a representation that is understood by a machine learning algorithm. The algorithm then learns patterns from historic games to map the relationship between the play-by-play data and the win probability and scores. During the course of a season, the models can be retrained to incorporate the most recent weeks of game data.
Using 1,400 NFL games across 32 teams and 4 seasons, SFL Scientific developed machine learning models capable of predicting the win probabilities and final scores for the home and away teams.
The derived win probability for any given play predicts the correct winner with over 80% accuracy. Further, the scores prediction model is on average within 5.26 points of the final score.
These models were optimized using play data, score data, year-to-date statistics, and game conditions. Given this dataset, the current models rely heavily on pre-game matchups and statistics that are expressed through the Vegas betting lines. A simple extension to improve modeling accuracy will incorporate other available data sources, including more granular player data.