2020 Final Model Release (version 2020.1.0)

04.25.2020 : release, 2020

The final predictions for the 2020 NHL draft are now available. See home for the links.

This despite the NHL draft being as yet unscheduled.

Summary of Updates:

As has been the case for all releases to date, the model and project as a whole have evolved a bit since the previous release. Here is a summary of some of the updates with respect to the initial 2020 model (2020.0.0).

  • The target data for defensemen has been changed from PPG to a combination of PPG and goals against per even-strength 60 minutes of ice time. In theory, this is better correlated with how 'good' a defenseman is.
  • The validation strategy was changed. To give a summary of how the model works, we used data from players drafted between 2011 and 2015 as training data for the 2020 predictions. Previously, we had, e.g., generated 2014 predictions from data for players drafted between 2006-2010 or so, and used this output to evaluate model performance. Now, instead, we just do a somewhat standard cross-validation on the data used to generate the 2020 predictions. The validation set is much larger this way, and arguably more relevant, although, the validation no longer takes into account the 'lag' between the training data and test. This lag for e.g., predicting the 2014 results may or may not have been relevant to the corresponding lag when predicting 2020 data, anyway. We still have the question of how relevant a model based on 'old' data is to the current draft class, but that is a factor for this project any way you do things, I think.
  • As a result of having a large cross validation set now, we are able to compute some standard performance metrics on this data. The performance data for this release is mentioned below.
  • Many bug fixes, including one 'critical' one.
  • Many cosmetic updates.
  • Many 'internal' changes that are a bit out of scope for this post.

Model Performance:

As mentioned above, model performance is now calculated in a more standard way, using the AUC score. This metric certainly has some caveats, but is widely used in the machine learning industry.

Please note that the models are currently optimized for the best and worst class. For example, this means that emphasis is placed on 'Miss Prob.' and 'First-Liner Prob.' for forwards.

AUC data for defensemen:

  • 1st Pair Prob.: 0.78
  • 2nd/3rd Pair, Defensive D Prob.: 0.46 (this means you should not read this data at all).
  • 2nd/3rd Pair, Offensive D Prob.: 0.72
  • Miss Prob.: 0.73

AUC data for forwards:

  • First-Liner Prob.: 0.85 (this looks good on paper, but is skewed by the fact that you can get a decent score for high-performers just by saying they will miss 100% of the time. Take with a grain of salt.)
  • Solid Prob.: 0.73
  • Miss Prob.: 0.77
  • 4th-Liner Prob.: 0.62 (this means you should just barely pay attention to this).

There are many ways to interpret these results. Maybe this will be the subject of a future post. As usual, performance for forwards outpaces performance for defensemen.

Please feel free to reach out with any questions on twitter.