Disclaimer, Data Sources, Content Licensing

09.30.2019 : meta, data

Updated on February 7th, 2020

As of the 2020.0.0 release, all data used to generate models and the subsequent predictions is provided by the venerable Elite Prospects. We reserve copyright on predictions and other content derived from this data, but obviously, not any data provided by Elite Prospects.


The remainder of the original post content is no longer relevant to nhlmldraft.net, but remains below for historical purposes and because I think the discussion is still of interest. I removed the creative content licensing statement and mark to avoid confusion.

**Sources of Data:** Whether or not data is copyrightable is a bit of a gray area. The short version is that data is not copyrightable, but presentation, the act of compilation, etc. probably is. Many hockey data sites list their policy on usage of their content. Here are a few examples. * Hockey Reference: Use of Data * Hockeydb: Usage These days, we also have popular, pre-fabricated licenses written by experts to govern use of content. For example, QuantHockey uses the Creative Commons Attribution + Noncommerical license, which, as the name suggests, means you can use the data, must name QuantHockey as the source, and should not make any money by presenting data from this source (the use of ads on QuantHockey then seems questionable, but I'm not sure enough of the details to say whether or not that's permitted by the license). Everything on Wikipedia is licensed under the Creative Commons Attribution - ShareAlike license. This is the same as above, but without the commerical usage restriction. This site regards Wikipedia as a viable source of data for this reason. However, to be pedantic, we'd have to make sure all that data arrived there from sources with licenses/policies that permitted that. Because we use Wikipedia for all pre-2019 per-player data, we get some biases in the data. For example, data about a 1st rounder is far more likely to be available on Wikipedia than a 7th rounder who never made the NHL. More on these biasess in a later post. For new data (e.g., current draft year), I mostly hand-pick data from EliteProspects. EliteProspects doesn't appear to publish a usage policy, so I use the common sense approach of don't take too much, and don't directly compete (I don't post propspect stats here, for example). EliteProspects' league leaders data are also used (e.g.). **Here is a summary of the data sources used to generate ml draft's content:** * Wikipedia: 100% of pre-2018 per-player data. If it's missing from Wikipedia, I don't have it. * EliteProspects: Current draft-year propsect data and league leaders data.