

Models Duration Train MSE Test MSE R2 Score
DummyRegressor 0.0003519058 4.292656363 4.012747466 -8.05E-06
LinearRegression 2.876500845 4.168436111 4.03817326 -0.0063443617
BayesianRidge 9.113944054 4.240433111 3.978284211 0.0085804578
RidgeCV 14.15430808 4.167578924 4.03089686 -0.0045310258
ElasticNetCV 6.157432079 4.24524836 3.981448748 0.0077918304
LassoCV 6.049508095 4.244920397 3.981493739 0.0077806182
GradientBoostingRegressor 10.3492372 4.250431632 3.994107012 0.004637292
AdaBoostRegressor 32.49189615 4.263147054 3.994472842 0.0045461242
ExtraTreesRegressor 70.33536506 0.2614326469 4.934085332 -0.2296126575
RandomForestRegressor 135.7508721 1.08187359 4.861826425 -0.2116051728
LinearSVR 53.98221612 4.314991285 4.073052891 -0.0150366387

Most regressors actually have metrics quite similar to the DummyRegressor (a regressor that disregards the input and only outputs the mean target value). This shows that our ‘smart’ regressors are having trouble learning how any of the feature correlate to the plus/minus per minute target value.

There are two execptions out of the ‘smart’ regressors that attempted learning. We can see that the Tree-based Regressors (ExtraTressRegressor and RandomForestRegressor) perform fairly well when looking at only the training data metrics. They have a lower mean square error than the DummyRegressor. However, when evaluating these two models against a test dataset rather than a training dataset, both the mean square error and score show much worse results than all other regressors, including the DummyRegressor. This is certainly a sign of these models overfitting on the training data, and not learning the true relationship between the features and the target value.

Examining the Data

The poor results of the regression models can be attributed to issues in the models or data. A closer examination of the data indicates the latter due to a substantial amount of noise.

The scatter plots above help demonstrate this noise. Each point represents a time capsule, with the x-position indicating the plus-minus per minute for that capsule. For the y-position, we take the training data (which uses season-averaged player statistic), average the five home and five away team players’ statistics for each time capsule, and compute the difference in a statistic. We would expect teams with a high win percentage to outscore opponents with a low win percentage. From the best fit line, we can see this is roughly true, but the correlation between plus/minus per minute and win percentage is almost zero. A similar situation occurs when we compare time capsule plus/minus per minute to season-averaged plus/minus per minute.

There are several reasons we have hypothesized as to why this noise is present

These results suggest that a classification model might perform better, as it simplifies the problem.


Models Duration Train Accuracy Test Accuracy
DummyClassifier 0.001994133 52.14% 49.71%
LogisticRegression 8.462990046 60.50% 60.58%
LinearDiscriminantAnalysis 6.796962977 61.12% 58.98%
GaussianNB 0.6752369404 53.71% 53.22%
GradientBoostingClassifier 147.259727 60.08% 60.31%
RandomForestClassifier 5.883100033 94.39% 51.62%
FNN with Dropout Reg 46.352 60.65% 60.36%

Modifying our approach to a classification problem lets us train models that produce more meaningful results than the regression models. Just like the previous approach, we can compare the smart models against a DummyClassifier, which only outputs the most common label. We find all of our models perform at least as well as the DummyClassifier in terms of accuracy. LogisticRegression, GradientBoostingClassifier, and FFN with Dropout even reach 60% accuracy while the DummyClassifier performs as well as a coin flip.

Comparison to Previous Work

Torres et. al. and Loeffelholz et. al. both attempted to predict results of NBA games using machine learning.1 2 They used box scores of teams from games played earlier in the season to train their models then they used the trained models to make predictions about the remaining games in the season. Below is a table showing the results from both experiments:

Source Model Average Classification Accuracy
Loeffelholz et. al. FFNN 71.67%
RBF 68.67%
PNN 71.33%
GRNN 71.33%
PNN Fusion 71.67%
Bayes Fusion 71.67%
Torres et. al. Linear Regression 69.91%
Logistic Regression 67.44%
SVM 65.96%
ANN 64.78%

Comparing the results from the table above with the results from our models, we can see that our models do a descent job of making predictions given that we are solving a harder problem of predicting results given player matchups rather than learning team performance over different games and seasons. Also, our model tries to select the best players to give a team the best chance of winning, and it learns by examining player statistics from a 30-second time capsules over a season. This presents a significant challenge even for a human expert.


  1. Loeffelholz, Bernard, Earl Bednar, and Kenneth W. Bauer. “Predicting NBA games using neural networks.” Journal of Quantitative Analysis in Sports 5.1 (2009). 

  2. Torres, Renator Amorim. “Prediction of nba games based on machine learning methods.” University of Wisconsin, Madison (2013).