Modeling my chance of winning at Halo Infinite with bayesian stats
Introduction: I used bayesian stats to model a match of Halo Infinite. The game consists of 2 teams of 12 players that must score points by defeating players of the other team. A team wins if it is the first to achieve 100 points or has the biggest score by the 15 minutes mark.
Results: The model obtained a median absolute error of 2.6 seconds in predicting when each team will score its next point. Moreover, it provided powerful insights such as estimates for the teams’ performance and the probability of a team winning at any given time.
Conclusion: The model was little better than a cumulative average (median absolute error of 3.4 seconds). Therefore, the bayesian model is well suited for a highly accurate analysis, but fails to justify its complexity and computational demand for simpler applications.
If you are like me, you have been playing Halo Infinite a lot in the past weeks. One game mode that I enjoy is Big Team Battle Slayer (BTBS) in which 2 teams of 12 players must score points by defeating players of the other team. A team wins if it is the first to achieve 100 points or has the biggest score by the 15 minutes mark.
Now, if you are really like me, you have also been wondering about how to calculate the chance of winning. Intuitively, I know that if the score is 20 - 10 for my team I have a slight advantage, whereas 90 - 80 is a certain win. But how can I quantify this probability? This is the question that my work sets out to answer using bayesian statistics.
To answer the proposed question, I selected two matches of Halo BTBS from YouTube and annotated the game statistics. The first game defines the train dataset. It was used to better understand this type of data and propose a model. The second game defines the test dataset. It was used to fit the proposed model in a new dataset and check the model’s performance.
Each dataset is composed of three variables (columns):
Due to the way the data was recorded, there are a few concerns which could not be addressed:
All things considered, that data should be enough for developing a proof of concept.
In the exploratory analysis, I demonstrate how the following properties about the data hold true, at least to an approximate extent.
We begin the analysis with Figure 1 which summarizes the train dataset. It shows the score progression for both teams. Besides the clear win for team blue, a pattern stands out. The score trajectories are linear, which suggests that the teams earn points at a constant rate.
To better explore the point rate, we refer to Figure 2. It shows the time between points (TBP) for each team as a function of score. As it can be seen from the regression curves, the average TBP is constant throughout the game.
Still about the time between points, Figure 3 plots the estimated TBP distribution for each team. The density curves have an exponential decay which favors an exponential distribution.
To evaluate the independence between points, Figure 4 shows TBP vs previous TBP for each team. Taking into account the estimation error, the regression curves are reasonably constant. This means that there is no dependence between points within each team.
Finally, we take a look at the correlation between teams. Figure 5 exhibits team blue’s TBP as a function of team red’s TBP, averaged over periods of 5 seconds. The horizontal regression curve indicates that team red has no influence on team blue and vice versa.
Accounting for all the discoveries made in the exploratory data analysis, I came up with the following bayesian model.
# Part 1: Prior hyperparameter
= 100
MAX_SCORE = 10
PRIOR_MEAN_TIME = MAX_SCORE / PRIOR_MEAN_TIME
mean_rate_of_rates = 1 / mean_rate_of_rates
rate_of_rates
# Part 2: Parameters
~ Exponential(rate_of_rates)
blue_point_rate ~ Exponential(rate_of_rates)
red_point_rate
# Part 3: Likelihood
~ Exponential(blue_point_rate)
blue_tbp ~ Exponential(red_point_rate) red_tbp
Part 1 summarizes my knowledge before the game even starts. My prior is that teams take on average 10 minutes to get 100 points.
# Part 1: Prior hyperparameter
= 100
MAX_SCORE = 10
PRIOR_MEAN_TIME = MAX_SCORE / PRIOR_MEAN_TIME
mean_rate_of_rates = 1 / mean_rate_of_rates rate_of_rates
Part 2 reports how parameters are sampled from the prior. Here,
red_point_rate
is team red’s point rate and measures how
many points team red makes per minute on average.
# Part 2: Parameters
~ Exponential(rate_of_rates)
blue_point_rate ~ Exponential(rate_of_rates) red_point_rate
Lastly, part 3 states that TBP follows an exponential distribution with the specified point rates. Moreover, each point is assumed to be independent from the others.
# Part 3: Likelihood
~ Exponential(blue_point_rate)
blue_tbp ~ Exponential(red_point_rate) red_tbp
Before fitting the model to the data, it is interesting to check if the selected prior is appropriate. To do that, we can simulate data from the model using only the prior information.
Figure 6 summarizes the simulation. It shows possible score progressions from a hypothetical team. As it can be seen, the model allows for a wide range of score progressions.
To avoid overfitting, we now turn to the test dataset to check the model’s performance. Figure 7 shows the score progression for both teams as observed in the test dataset. Notice that the test dataset is a close match and is very different from the train dataset.
The model attained a good performance in the test dataset and Figure 8 makes it very clear. It shows the observed (black dot) and predicted (gray line) time for each score point. The fact that it is difficult to distinguish one from the other is a good sign.
Figure 9 allows us to zoom in into the model errors. It plots the error (black line) made for each score point. The errors are quite small, with a median size of 2.6 seconds.
Figure 9 also shows shaded regions. They should contain most of the error line and represent how wrong the model thinks it can be. Since the regions cover 98.3% of the errors and are narrow (median size of 15.9 seconds), the model confidence is on point.
Now that we know that the model is well fit, we can turn to it for answers on how good the teams are. Table 1 summarizes the point rates. Despite team red winning (estimated point rate of 12.3 points per minute), the difference between point rates (contrast) is not significant. This means that for all purpose and intent, both teams are equally good.
Point rate | Median | 95% Credible interval |
---|---|---|
Blue | 11.8 | [9.5, 14.2] |
Red | 12.3 | [9.9, 14.8] |
Contrast | −0.5 | [−3.9, 2.9] |
Finally, Figure 10 displays team blue’s probability of winning (black line) as predicted by the model. It suggests an advantage for team blue during most of the game. This agrees with Figure 7 in which team blue has more points for most of the game.
Again, the shaded region summarizes other possible probabilities and represents how wrong the model thinks it can be. An interesting part is the beginning of the match when team blue had 2.5 times the number of points of team red. As per the model, team blue should have won with a great confidence, but team red was able to turn the game and win at the last minute.
For the last remark, we need to compare how the model performs
compared to a baseline. Figure 11
summarizes the predictions made by a cumulative average. That is, the
next point for team blue is predicted to happen at
current_time
+
blue_average_tbp
.
Just like for the bayesian model, it is difficult to distinguish the
predictions from the observed data.
Figure 12 plots the error made for each score point. The median absolute error is 3.4 seconds and the intervals have a median size of 19.5 seconds (coverage = 98.4%). In summary, the bayesian model obtained results about 20% better than the baseline.
The model showed a good fit and was able to provide powerful insights such as the probability of team blue winning at any time point. That being said, the performance was little better than a cumulative average. Therefore, the bayesian model is well suited for a highly accurate analysis, but fails to justify its complexity and computational demand for simpler applications.
If you see mistakes or want to suggest changes, please create an issue on the source repository.