Too long, didn’t read

Introduction: I used bayesian stats to model a match of Halo Infinite. The game consists of 2 teams of 12 players that must score points by defeating players of the other team. A team wins if it is the first to achieve 100 points or has the biggest score by the 15 minutes mark.

Results: The model obtained a median absolute error of 2.6 seconds in predicting when each team will score its next point. Moreover, it provided powerful insights such as estimates for the teams’ performance and the probability of a team winning at any given time.

Conclusion: The model was little better than a cumulative average (median absolute error of 3.4 seconds). Therefore, the bayesian model is well suited for a highly accurate analysis, but fails to justify its complexity and computational demand for simpler applications.

Introduction

If you are like me, you have been playing Halo Infinite a lot in the past weeks. One game mode that I enjoy is Big Team Battle Slayer (BTBS) in which 2 teams of 12 players must score points by defeating players of the other team. A team wins if it is the first to achieve 100 points or has the biggest score by the 15 minutes mark.

Now, if you are really like me, you have also been wondering about how to calculate the chance of winning. Intuitively, I know that if the score is 20 - 10 for my team I have a slight advantage, whereas 90 - 80 is a certain win. But how can I quantify this probability? This is the question that my work sets out to answer using bayesian statistics.

Dataset

To answer the proposed question, I selected two matches of Halo BTBS from YouTube and annotated the game statistics. The first game defines the train dataset. It was used to better understand this type of data and propose a model. The second game defines the test dataset. It was used to fit the proposed model in a new dataset and check the model’s performance.

Each dataset is composed of three variables (columns):

time: Time at which the score was recorded.
blue: Score of team blue at the time. Team blue is the team of the player who recorded the gameplay.
red: Score of team red at the time. Team red is the enemy team of the player who recorded the gameplay.

Due to the way the data was recorded, there are a few concerns which could not be addressed:

The data might not be representative because it was extracted from YouTube. This is due to the fact that people tend to publish only games in which they had a good performance.
The data was manually extracted, which means that there may be a difference between the recorded and the actual time for each score point.
The maximum time resolution is 1 second as it is constrained by the in-game time resolution, obtained from the scoreboard timer.

All things considered, that data should be enough for developing a proof of concept.

Exploratory data analysis

In the exploratory analysis, I demonstrate how the following properties about the data hold true, at least to an approximate extent.

Independent points : Each point is earned independently from the other points.
Constant average : The average time between points is constant throughout the game. The average differs between the teams.
Exponentially distributed : The time between points follows an exponential distribution. The distribution differs between the teams.

We begin the analysis with Figure 1 which summarizes the train dataset. It shows the score progression for both teams. Besides the clear win for team blue, a pattern stands out. The score trajectories are linear, which suggests that the teams earn points at a constant rate.

Figure 1: Score progression for both teams. As observed in the train dataset.

To better explore the point rate, we refer to Figure 2. It shows the time between points (TBP) for each team as a function of score. As it can be seen from the regression curves, the average TBP is constant throughout the game.

Figure 2: TBP for each team as a function of score. As observed in the train dataset.

Still about the time between points, Figure 3 plots the estimated TBP distribution for each team. The density curves have an exponential decay which favors an exponential distribution.

Figure 3: Estimated TBP distribution for each team. As observed in the train dataset.

To evaluate the independence between points, Figure 4 shows TBP vs previous TBP for each team. Taking into account the estimation error, the regression curves are reasonably constant. This means that there is no dependence between points within each team.

Figure 4: TBP for each team as a function of previous TBP. As observed in the train dataset.

Finally, we take a look at the correlation between teams. Figure 5 exhibits team blue’s TBP as a function of team red’s TBP, averaged over periods of 5 seconds. The horizontal regression curve indicates that team red has no influence on team blue and vice versa.

Figure 5: Team blue’s TBP as a function of team red’s TBP, averaged over periods of 5 seconds. As observed in the train dataset.

Model

Accounting for all the discoveries made in the exploratory data analysis, I came up with the following bayesian model.

# Part 1: Prior hyperparameter
MAX_SCORE = 100
PRIOR_MEAN_TIME = 10
mean_rate_of_rates = MAX_SCORE / PRIOR_MEAN_TIME
rate_of_rates = 1 / mean_rate_of_rates

# Part 2: Parameters
blue_point_rate ~ Exponential(rate_of_rates)
red_point_rate ~ Exponential(rate_of_rates)

# Part 3: Likelihood
blue_tbp ~ Exponential(blue_point_rate)
red_tbp ~ Exponential(red_point_rate)

Part 1 summarizes my knowledge before the game even starts. My prior is that teams take on average 10 minutes to get 100 points.

# Part 1: Prior hyperparameter
MAX_SCORE = 100
PRIOR_MEAN_TIME = 10
mean_rate_of_rates = MAX_SCORE / PRIOR_MEAN_TIME
rate_of_rates = 1 / mean_rate_of_rates

Part 2 reports how parameters are sampled from the prior. Here, red_point_rate is team red’s point rate and measures how many points team red makes per minute on average.

# Part 2: Parameters
blue_point_rate ~ Exponential(rate_of_rates)
red_point_rate ~ Exponential(rate_of_rates)

Lastly, part 3 states that TBP follows an exponential distribution with the specified point rates. Moreover, each point is assumed to be independent from the others.

# Part 3: Likelihood
blue_tbp ~ Exponential(blue_point_rate)
red_tbp ~ Exponential(red_point_rate)

Prior model analysis

Before fitting the model to the data, it is interesting to check if the selected prior is appropriate. To do that, we can simulate data from the model using only the prior information.

Figure 6 summarizes the simulation. It shows possible score progressions from a hypothetical team. As it can be seen, the model allows for a wide range of score progressions.

Figure 6: Score progression for a hypothetical team. As simulated from the prior model.

Results

Model performance

To avoid overfitting, we now turn to the test dataset to check the model’s performance. Figure 7 shows the score progression for both teams as observed in the test dataset. Notice that the test dataset is a close match and is very different from the train dataset.

Figure 7: Score progression for both teams. As observed in the test dataset.

The model attained a good performance in the test dataset and Figure 8 makes it very clear. It shows the observed (black dot) and predicted (gray line) time for each score point. The fact that it is difficult to distinguish one from the other is a good sign.

Figure 8: Observed (black dot) and predicted (blue line) time for each score point. As observed in the test dataset.

Figure 9 allows us to zoom in into the model errors. It plots the error (black line) made for each score point. The errors are quite small, with a median size of 2.6 seconds.

Figure 9 also shows shaded regions. They should contain most of the error line and represent how wrong the model thinks it can be. Since the regions cover 98.3% of the errors and are narrow (median size of 15.9 seconds), the model confidence is on point.

Error (black line) made for each score point. Shaded regions should contain most of the error line and represent how wrong the model thinks it can be. As observed in the test dataset.

Figure 9: Error (black line) made for each score point. Shaded regions should contain most of the error line and represent how wrong the model thinks it can be. As observed in the test dataset.

Inference

Now that we know that the model is well fit, we can turn to it for answers on how good the teams are. Table 1 summarizes the point rates. Despite team red winning (estimated point rate of 12.3 points per minute), the difference between point rates (contrast) is not significant. This means that for all purpose and intent, both teams are equally good.

Table 1: Posterior estimate results for the test data.
Point rate	Median	95% Credible interval
Blue	11.8	[9.5, 14.2]
Red	12.3	[9.9, 14.8]
Contrast	−0.5	[−3.9, 2.9]

Probability of winning

Finally, Figure 10 displays team blue’s probability of winning (black line) as predicted by the model. It suggests an advantage for team blue during most of the game. This agrees with Figure 7 in which team blue has more points for most of the game.

Again, the shaded region summarizes other possible probabilities and represents how wrong the model thinks it can be. An interesting part is the beginning of the match when team blue had 2.5 times the number of points of team red. As per the model, team blue should have won with a great confidence, but team red was able to turn the game and win at the last minute.

Team blue's probability of winning (black line) as predicted by the model for each time point. Shaded region summarizes other possible probabilities and represents how wrong the model thinks it can be. As observed in the test dataset.

Figure 10: Team blue’s probability of winning (black line) as predicted by the model for each time point. Shaded region summarizes other possible probabilities and represents how wrong the model thinks it can be. As observed in the test dataset.

Baseline comparison

For the last remark, we need to compare how the model performs compared to a baseline. Figure 11 summarizes the predictions made by a cumulative average. That is, the next point for team blue is predicted to happen at current_time + blue_average_tbp. Just like for the bayesian model, it is difficult to distinguish the predictions from the observed data.

Figure 11: Observed (black dot) and predicted (blue line) time for each score point. As observed in the test dataset with the cumulative average model.

Figure 12 plots the error made for each score point. The median absolute error is 3.4 seconds and the intervals have a median size of 19.5 seconds (coverage = 98.4%). In summary, the bayesian model obtained results about 20% better than the baseline.

Error (black line) made for each score point. Shaded regions should contain at least 95% of the errors. As observed in the test dataset with the cumulative average model.

Figure 12: Error (black line) made for each score point. Shaded regions should contain at least 95% of the errors. As observed in the test dataset with the cumulative average model.

Conclusion

The model showed a good fit and was able to provide powerful insights such as the probability of team blue winning at any time point. That being said, the performance was little better than a cumulative average. Therefore, the bayesian model is well suited for a highly accurate analysis, but fails to justify its complexity and computational demand for simpler applications.

Bayesian halo