This week is the 2014 MIT Sloan Sports Analytics Conference. While the conference has not always had a panel on hockey, there is one this year. To that end, I took a look at some of the innovative pieces of hockey analytics work since the last conference panel in 2012. I’ve made a list below, in no particular order, along with links and a quick summary of the contribution. Giving the growth of analytics and the growth of hockey analytics, this list is surely incomplete.
Player Usage Charts
Rob Vollman created Player Usage Charts and they are helpful for getting a sense of how players are used and how they are performing. These charts do a nice job of getting multiple dimensions onto a single graphic and providing context for evaluation of players. The original work can be found here: Original Player Usage Charts and an updated interactive version with help from Robb Tufts is available at this link: Interactive Player Usage Charts.
Darryl Metcalf at Extra Skater has put together an impressive set of visuals tools for analysis. Many of the tools come from other places and other people (Gabe Desjardins, Ben Wendorf, Rob Vollman) as Metcalf notes in his about link; however, having them in a single site presented well is very useful. I particularly like the individuals games summaries such as this Flames-Flyers summary and this Rangers-Pens summary.
Most folks doing heavy #fancystats at one time or another have had to scrape play by play data from nhl.com. The nhlscrapr package for the statistical software R from Andrew Thomas and Sam Ventura makes this process much, much easier by essentially downloading and parsing the data for you. While the statistical software R has a sleep learning curve, it is one of the most commonly used tools in analytics. One of the great advantages to R is that it is free to download and it has a community of people who are constantly making new packages and new code available. The nhlscrapr package can be downloaded at nhlscrapr site with the accompanying nhlscrapr reference manual.
Eric Tulsky’s work with Geoffrey Detweiler, Robert Spencer, and Corey Sznajder was introduced to a wider audience at last year’s MIT Sloan Conference. In that paper, which can be found at this link and subsequent work they (and others) have shown the importance of carrying the puck through the neutral zone and into the offensive zone. The original work has now expanded to data collection on a good number of teams. There is now also a Zone Exit Project.
Eric Tulsky wrote an influential piece leading up to the 2013 NHL draft on evaluating how players in various leagues outside the NHL are used. This analysis which builds on previous work by Jonathan Willis and Scott Reynolds estimates the relative strength of opposing forwards and opposing defensemen. This methodology does this by estimating the TOI for these players using the number of goals for which they were on the ice for their team and adjusting for the team scoring rate.
Macdonald’s Expected Goals Model
Prediction of future hockey performance is not limited to Corsi and Fenwick. Brian addresses other metrics and ties those to the players on the ice during a given shift as well as other contextual information about that shift. Thus, this approach takes an über form of WOWY that accounts simultaneously for all of the other factors present during a given shift to isolate the impact of a given player. This work was originally presented at the 2012 MIT Sloan conference: link to the paper. A further explanation of this methodology was given by Macdonald in this post.
Total Hockey Ratings
My Total Hockey Ratings or (THoR), which was also originally present at MIT Sloan, is similar to the work of Macdonald above in that both models account for who is one the ice for and against for each action along with other context dependent factors. Under THoR, each event (hit, shot, miss, etc) in the NHL’s RTSS system is given a valued based upon the net probability it leads to a goal. The original model was purely based upon even strength events and introduced a methodology for adjusting shot (x,y) location to account for rink biases. The latest results and updates from THoR both even strength and the newer all events have shown a high level of reliability.
Steve Burtch’s dCorsi (or Delta Corsi) is similar to the previous two innovations in that it uses a statistical regression to account for factors. As the name implies, it is based upon Corsi and adjusting Corsi for average TOI, Zone Starts, QoT and QoC. The germ for dCorsi was a study on a metric called Shut Down Index which ‘morphed’ into the dCorsi which is discussed in this blog post looking at defencemen through the first quarter of a season.
Parkatti’s Expected Goals
So now not only do we have metrics in hockey with uninformative names — Corsi, Fenwick, PDO — but we now have metrics with the same name: expected goals. Michael Parkatti’s expected goals weights each shot by the probability of being a goal given distance and shot type. Michael was the winner of the Edmonton Oilers hackathon and is now involved with the team. Parkatti gave some idea of the predictive ability of this method in this post.
There have also been some promising other methods that have appreared which have a good deal of potential for creating innovation. I want to highlight three of these. The first is by Gramacy et al and received some interest from the Hockey Blogosphere. The methodology they have created is one with a good deal of potential. Similarly, the Mean Even Strength Hazard (MESH) rating approach of Andrew Thomas et al is a method that is worth reading. Finally, Josh Weissbock has developed some machine learning algorithms for the analysis of hockey. Here’s his academic conference paper. Weissbock also wrote this blog post on the topic.
As I mentioned above, hockey analytics has expanded a great deal in the past two years. I excited for this years MIT Sloan Sports Analytics Conference (#SSAC2014) Hockey Panel and look forward to the conference.
Thanks to Brian Macdonald @greaterthanPM for discussion on this topic.