2022 NHL Draft Estimates (Model version 2022.1.0)

05.28.2022 : release, 2022

New data is posted for prospects that are eligible for the 2022 NHL Draft:

The "ML model" noted above is the one that's been the primary feature of this site for the past two or three years. The "xG (expected goals) model" is a new model; more on this later.

ML Model Details

As a reminder, cross-validation data is generated by essentially testing the model against the players used to train the model. Since we already know something about the outcome for these players, this allows us to judge the model's performance. For example, we can see what the model would have thought about drafting Brayden Point, or Matt Barzal (oops).

For this year, I've restricted the estimates to the CHL leagues (OHL, WHL, QMJHL). In years past, I've tried to consider other leagues, but I think CHL-only is the sweet spot for the time being. Considering more leagues has the obvious benefit of leading to estimates for more draft-eligibles, and the other obvious benefit of increasing the size of the data set. However, different leagues can be difficult to compare against each other. Scale factors capture some, but, in my opinion, not all of the inhomogeneity. Additionally, some leagues don't collect as much data as others, so when considering a many-leagues dataset, you're often limited to the league that provides the least data. A goal of this project has always been to favor "accurate something" over "inaccurate everything", and considering the three CHL leagues gets us closest to that goal.

The two metrics I use to evaluate ML model performance are the ROC curve and the PR curve. Here's the ROC curve for the ML model for forwards:

You could use your google-foo to determine if this is a good ROC curve or not, since it's a subjective matter, but from my POV:

  • It's an improvement from years past.
  • It's somewhat inflated by the fact that most draft choices do not end up as top-6 forwards in the NHL. It's analogous to the way you could have a "model" that classified every draft pick as a miss. This "model" would be about 90% accurate.
  • The area under the ROC curve (AUC score) can loosely be interpreted as the probability that the ML model will rank a randomly chosen positive instance higher than a randomly chosen negative one, so being able to claim that the model can do this 85% of the time sounds pretty good on the surface, but see above.

    Here's the PR curve:

  • It's about a 20% improvement from last year, although other things have changed in that time.

  • Loosely speaking, this curve concerns the model's precision, e.g., how often a player becomes a top-6 forward in the NHL given that we estimate they will be.
  • It follows from the above that this will lead to a less rosy view of our model. This is somewhat intuitive -- We know from experience that most players we think or hope will become top-6 forwards in the NHL ultimately don't make it.

All things considered, these are decent results compared to where we've been in years past.

The results for the defenseman model are - as usual - not quite as good:

ROC curve:

PR curve:

These metrics for the defensemen model are about the same as last year. The actual figures are actually slightly worse, but this year's model is likely to be much more stable -- last year's was a little "lucky" in some ways.

The defensive model "lightly" considers defensive aptitude for labeling, which turns out to make things rather difficult. We could have gotten better metrics by focusing on offense only (as we do for forwards), but, in my opinion, this would be somewhat unintuitive for readers.

All in all, the situation mirrors reality (or at least pundit hubbub) -- drafting / estimating success probabilities for defensemen is tricky!

Expected Goals Model

This year, I also started working on a separate model that's unrelated to the ML model. This model is based on hockeyviz's expected goals (xG) model, but it's scaled down in some ways, and it's also more approximate since many of the quantities used as input to the model are less available for CHL data. Shot data is not available at all (to me) for the WHL, so xG models can be created only for OHL and QMJHL seasons.

The gist of this model is that it seeks to compute goal probabilities for shots, and then attribute different portions of those probabilities to different circumstances of the shot (e.g., who was on the ice for, who was on the ice against, shot location).

A lot can be done with the xG model and output data, but for now, we simply plot players' isolated impact on shot odds for vs. isolated impact on shot odds against. Since this output doesn't say a ton about shot rates or xG rates, the next step would be to compute these rates. Probably a task for next season.

It's also worth mentioning that, at best, the xG models will accurately evaluate players' performance as a prospect; they don't directly predict NHL performance.

It would be possible to one day use xG model output as features/input in the ML model, but that would probably require WHL shot data, and several years of it.

Players of Interest

First of all - why "players of interest"? Why am I not ranking the top n choices? Simply put, all the ML model looks to answer is whether or not a player will be a top-6 forward or top-pairing (or so) defensemen. The underlying reason for this is that we need a big enough sample of something for the model to work, and we don't have a whole pile of generation players. Furthermore, a player having a high probability of becoming a top-6 forward or a top-pairing defenseman does not indicate they'll be a superstar, and a lower probability does not indicate they're (traditional) 3rd-liner material. Taking all that into consideration, the ML model in some ways paints with a broad brush, and one way to use its output is to find players who are way out of position.

The xG model is still a work in progress, and given a) the approximations used in the model, and b) the simple understanding that performance in junior doesn't guarantee performance in the NHL, we're in the much the same position: results should be taken with a grain of salt, but unheralded prospects who look very good in the model may be worth a second look.

Fine-grained rankings, especially for those near the top, is still a task for real life scouting and player evaluation -- at least with regard to this project.

Ok, let's look at a few players:

  • Marcus Nguyen -

    With a top-6 probability estimate of 36%, he's extremely good value at where he's expected to be drafted (I'm not sure where that will be, really).

  • John Babcock - Rated rather highly by the ML model, Babcock is a LHD who put up 23 pts in 57 games with the Kelowna Rockets this year. Looking at the stat ranks used in the ML model, Babcock appears to be a well-rounded, jack-of-all-trades type player.

HMs and Model Favorites:
  • Dean Loukus - An OHL overager who, according to the xG model, contributes significantly both offensively and defensively. There's no strict way to rank the xG model output we use since it's 2d, but it appears he's been one of the better 19 year-olds in the OHL this year. The fact that he has a positive +/- in a sea full of negative ones is also interesting. This was also his first season in the OHL despite being an overager.

  • Nolan Collins - Unranked by CSS at midterm, but picked up a NA CSS ranking at season's end (NA 153). Played in the U18 World Championship. Not much buzz at all about Collins, but scores very well (probably a bit too well) in the xG model. Does not score well in the ML model.

  • Niks Feneko - The QMJHL version of Collins in some ways. Looks very good in the xG model, but not highly thought-of by the rankings (124 NA CSS ranking). With an August birthday, he'll still be 17 on draft day.

  • Cedric Guidon and Kirill Kudryavtsev - I tried to look for players who looked decent in both the ML model and the xG model, and landed on these two. According to the xG model output, these players both appear to contribute significant offense without sacrificing too much defense. Neither are a guaranteed top 6 or top pair player, but both appear to have a significantly better probability than expected for their respective expected draft positions.

I'll cut it off there for now. There are quite a few prospects that are easy to be positive about.

Quick Retrospective, Who Did I Like In 2021?

It was tough to do much useful work last year due to pandemic-shorted or even pandemic-eliminated seasons, but I still came up with a couple of value picks. Here's a couple players I thought were interesting:

  • Riley Kidney -- Kidney has an aggregate scout rank around 70, and while it's a little dubious to compare him to players drafted around 70th overall, we'll do it anyway. Looking at the cross-validation data, you'll see it's not common to find a player with a 15% chance to become a top-6 forward around 70th overall (Though Jordan Weal is an exception). Kidney ranks high in most P60 stat ranks, and has Ryan Spooner among his comparables. The model doesn't consider playoff stats, but putting up 17 points in 9 playoff games seems notable.

2022 update: Kidney ended up being taken by the Canadiens around the 2/3 round turn (63rd overall). He finished 7th in QMJHL scoring this year, and looks to be playing in the AHL with Laval next season:

  • Olivier Nadeau -- Although a 12% chance of becoming a top-6 forward isn't very high, like Kidney, it's still significantly more (on average) than a typical 90th overall selection. Probability-wise, a comparable for Nadeau is Linden Vey, who was drafted 96th overall in 2009. Additionally, Nadeau ranks 1st (among players graded) in almost every stat related to assists. Seems like he may be worth more than a 4th round pick.

2022 update: Nadeau was taken at the top of the 4th round in the 2021 draft (97th overall) by Buffalo. He lead Shawinigan in scoring this season (78 pts in 65 games). Nadeau Earned his entry level contract this year.

What Else?

I'm never too sure what details are of interest to readers, or what questions might be out there, so if there's something specific you're interested in, please reach out on twitter.