I love this idea that more NHL teams aren’t investing in analytics because the price tag reaches into the hundreds of thousands of dollars. This is a league in which a team still employs Colton Orr for close to a million dollars per season.
Maybe teams would rather not spend $250,000 to find out that they shouldn’t pay guys like him $925,000.
So this got me thinking about whether we could estimate the amount of return that spending $250k on analytics would yield. I decided to do a bit of back of the envelope computations to estimate what would be the benefit from having an analytics staff. No new analyses will be presented here, just some crude approximations. Almost all of this is low hanging fruit and no doubt there are improvements that can be made. The list is not comprehensive. I simply want to make the point that to say that hockey analytics have not proven their worth is ridiculous and I don’t need to be too thorough to do that.
I’ve broken down what follows into three categories: drafting, strategy and player acquisition. Within each category, I’ve estimated the return via goals, or points or wins. Following some previous work by Gabe Desjardins, I’m using that a win is worth 2 points and is worth roughly $2MM. Consequently, a point is worth $1MM. (Note the new CBA prompted me to round down slightly but that rounding should likely go up given what is expected for the salary cap next year.) We’re also going to look at things on a yearly basis so you might only get to sign a free agent goalie every fourth year so we’ll take 1/4 of the value from that in this assessment.
Drafting
The first area we’ll check is drafting. There are two aspects to this: evaluating players to draft and determining the value of draft picks. The former is a good bit more difficult to evaluate since it is harder to get a sense of how much can be gained in this regard. I also think analytics can help in this area both in predicting the future and in evaluating mistakes of the past.
Some recent work by Eric Tulsky is suggestive of the data that might be mined from leagues outside the NHL and might be useful for projection of player performance in the NHL. This work helps to determine how a player is being used by their current team and how difficult is the quality of opponents with whom they share the ice. Such supplemental information might yield additional information that would be useful. If a players is generating 25 goals in 75 games but is playing sheltered minutes then we may want to draft others who are producing . This work is relatively new and so it is harder to evaluate how they will perform. However, it is likely safe to say that such methods are worth a couple of goals a year. Let’s say 2. That’s a 1/3 of a win or about $670k.
One area of drafting that has seen a good deal of work is that idea of league equivalencies. The idea of these approaches is to look at the performance of a player who spent year 1 in a non-NHL league (e.g SM-Liiga in Finland) and year 2 in the NHL. By looking at the relationship between points in these two years, we can create a measure of the quality of the non-NHL league but also a method for estimation of production once a players has joined the NHL. The most famous example of the application of this sort of work comes from the decision by the Winnipeg Jets in 2011 to draft Mark Scheifele over Sean Couturier. The projections for Couturier were clearly higher than those for Scheifele as was noted at the time. Couturier has gone on to perform very well for the Flyers and is probably worth a couple of wins a year to them. Such circumstances are not going to crop up every year but when they do they are valuable. I’ll assume that such circumstances occur once every 4 years. Using Hockey-Reference.com’s Point Shares metric, the value of Couturier has been about 2.7 points over Scheifele over three years. That’s about 0.9 points per year. Or about 0.9/4 = 0.225 points or $225k per year.
So conservatively the total for drafting is about $895k per year from drafting. This excludes something like the creation and use of a value pick chart, for example this one or these.
Strategy
There is a good deal of evidence that pulling the goalie more often will yield some results. David Beaudoin and Tim Schwartz have done a nice study that suggests that teams should pull their goalies more often. This mathematical approach uses goal scoring rates under a variety of circumstances and uses these rates to estimate the average number of points under a variety of circumstances. They conclude by suggesting that on average a point per season can be gained with this approach. As with any strategic innovation, it’s success will lead to (being copied and eventually) its demise. Pulling the goalie faces many of the same difficulties as going for fourth downs in the NFL. It is less likely to be adapted because the costs. Nonetheless it is clear that analytics has suggested an improved way to play the game. Assuming such an advantage would be adopted by other teams we might lose our advantage after a year and so we’ll round this down to $100K in value. I do this here not because I don’t think it will work — I’ve seen first hand how well it works as my collaborator Chris Wells has led the NCAA in minutes with the goalie pulled for several consecutive years — but rather since I think it is less likely to be adopted by coaches.
Another strategic move that has been shown to have a strategic impact is carrying the puck more and dump-and-chase-ing less. Eric Tulsky has gotten a good deal of attention as the lead researcher in this area. His paper on this was a poster at last year’s MIT Sloan Sports Analytics Conference. The idea here is that carrying the puck into the offensive zone leads to better possessions which leads to more goals. It is hard to get a sense of the value here but let’s try. We can estimate about 50 zone entries at 5v5 per game (source here and here). If we can increase the number of shots using this method by 0.5 per game that would roughly be 3 goals per year or about $ 1MM. As above we spread out this gain over several years to get $250k per year.
So the number here suggest that it is possible to get $350k per year in value. Value from the analyses given here is the most likely to be transient but it is also possible that analysts will find new areas to exploit.
Trades and Free Agency
Next we’ll look at player acquisition through trades or free agency. One of the most useful aspects of analytics has been noted by Phil Birnbaum which is that they prevent stupid. As the article argues, there is more to be gained by not being stupid than there is by being smart. That is useful and a big part of analytics.
The litany of large NHL contracts that might have appropriately received another look if analytics are involved is long and includes: Bryzgalov’s signing by Philadelphia, Douglas Murray’s trade acquisition by Pittsburgh from San Jose, or his signing with Montreal, or trading for Robyn Regehr. Not every team signs or tries to sign a big free agent every year. Let’s say one every four years that overplays by $1MM per year. That’s a savings of $250k per year. And we haven’t even looked at the contracts of Pekka Rinne, or David Clarkson, or Brad Richards, or the aforementioned Colton Orr. As the salary cap grows this is likely to be an even bigger value.
Summary
There are lots of ways that hockey analytics can impact a team. I’m an analytics guy so I’ve tried to be conservative here about the value that analytics can bring. Overselling analytics won’t help it’s adoption. The cost of a reasonable analytics department might be around $300k. You could get a good deal of consulting input alone for $100k. I think it’s pretty easy to see that analytics could yield (at a minimum) benefits for an NHL team of over $1MM from the areas listed above. As I stated previously, these are estimates that are low and they are not exhaustive. Hopefully from this, it should pretty clear that cost is not one of the factors holding back analytics in the NHL.
]]>So last week during the Hockey Analytics panel at the MIT Sloan Sports Analytics Conference, Eric Tulsky referenced a study that Michael Schuckers and his student Lauren Brozowski did on referees in the NHL. While this work has been available publicly on the conference website, due to some unknown oversight we did not post it here. So here it is. The paper is based upon two NHL seasons worth of data. The data don’t let us know who among the referees made the call just who was on the ice for the call. Most of the results are pretty obvious. The later you are in a tight game, the less likely it is that a penalty is called. Home teams are less likely to be called for penalties than visitors. There also seems to be a great deal of consistency among the referees in their rates of penalties after adjusting for a variety of factors including score, period, the teams involved, etc.
Link to slides from 2011 JSM Talk
Photo by Mark Canter, http://en.wikipedia.org/wiki/File:Dmitry_Kulikov_Panthers_Shane_Heyer.jpg
]]>Player Usage Charts
Rob Vollman created Player Usage Charts and they are helpful for getting a sense of how players are used and how they are performing. These charts do a nice job of getting multiple dimensions onto a single graphic and providing context for evaluation of players. The original work can be found here: Original Player Usage Charts and an updated interactive version with help from Robb Tufts is available at this link: Interactive Player Usage Charts.
Extra Skater
Darryl Metcalf at Extra Skater has put together an impressive set of visuals tools for analysis. Many of the tools come from other places and other people (Gabe Desjardins, Ben Wendorf, Rob Vollman) as Metcalf notes in his about link; however, having them in a single site presented well is very useful. I particularly like the individuals games summaries such as this Flames-Flyers summary and this Rangers-Pens summary.
nhlscrapr
Most folks doing heavy #fancystats at one time or another have had to scrape play by play data from nhl.com. The nhlscrapr package for the statistical software R from Andrew Thomas and Sam Ventura makes this process much, much easier by essentially downloading and parsing the data for you. While the statistical software R has a sleep learning curve, it is one of the most commonly used tools in analytics. One of the great advantages to R is that it is free to download and it has a community of people who are constantly making new packages and new code available. The nhlscrapr package can be downloaded at nhlscrapr site with the accompanying nhlscrapr reference manual.
Zone Entries
Eric Tulsky’s work with Geoffrey Detweiler, Robert Spencer, and Corey Sznajder was introduced to a wider audience at last year’s MIT Sloan Conference. In that paper, which can be found at this link and subsequent work they (and others) have shown the importance of carrying the puck through the neutral zone and into the offensive zone. The original work has now expanded to data collection on a good number of teams. There is now also a Zone Exit Project.
Prospect Usage
Eric Tulsky wrote an influential piece leading up to the 2013 NHL draft on evaluating how players in various leagues outside the NHL are used. This analysis which builds on previous work by Jonathan Willis and Scott Reynolds estimates the relative strength of opposing forwards and opposing defensemen. This methodology does this by estimating the TOI for these players using the number of goals for which they were on the ice for their team and adjusting for the team scoring rate.
Macdonald’s Expected Goals Model
Prediction of future hockey performance is not limited to Corsi and Fenwick. Brian addresses other metrics and ties those to the players on the ice during a given shift as well as other contextual information about that shift. Thus, this approach takes an über form of WOWY that accounts simultaneously for all of the other factors present during a given shift to isolate the impact of a given player. This work was originally presented at the 2012 MIT Sloan conference: link to the paper. A further explanation of this methodology was given by Macdonald in this post.
Total Hockey Ratings
My Total Hockey Ratings or (THoR), which was also originally present at MIT Sloan, is similar to the work of Macdonald above in that both models account for who is one the ice for and against for each action along with other context dependent factors. Under THoR, each event (hit, shot, miss, etc) in the NHL’s RTSS system is given a valued based upon the net probability it leads to a goal. The original model was purely based upon even strength events and introduced a methodology for adjusting shot (x,y) location to account for rink biases. The latest results and updates from THoR both even strength and the newer all events have shown a high level of reliability.
dCorsi
Steve Burtch’s dCorsi (or Delta Corsi) is similar to the previous two innovations in that it uses a statistical regression to account for factors. As the name implies, it is based upon Corsi and adjusting Corsi for average TOI, Zone Starts, QoT and QoC. The germ for dCorsi was a study on a metric called Shut Down Index which ‘morphed’ into the dCorsi which is discussed in this blog post looking at defencemen through the first quarter of a season.
Parkatti’s Expected Goals
So now not only do we have metrics in hockey with uninformative names — Corsi, Fenwick, PDO — but we now have metrics with the same name: expected goals. Michael Parkatti’s expected goals weights each shot by the probability of being a goal given distance and shot type. Michael was the winner of the Edmonton Oilers hackathon and is now involved with the team. Parkatti gave some idea of the predictive ability of this method in this post.
There have also been some promising other methods that have appreared which have a good deal of potential for creating innovation. I want to highlight three of these. The first is by Gramacy et al and received some interest from the Hockey Blogosphere. The methodology they have created is one with a good deal of potential. Similarly, the Mean Even Strength Hazard (MESH) rating approach of Andrew Thomas et al is a method that is worth reading. Finally, Josh Weissbock has developed some machine learning algorithms for the analysis of hockey. Here’s his academic conference paper. Weissbock also wrote this blog post on the topic.
As I mentioned above, hockey analytics has expanded a great deal in the past two years. I excited for this years MIT Sloan Sports Analytics Conference (#SSAC2014) Hockey Panel and look forward to the conference.
Thanks to Brian Macdonald @greaterthanPM for discussion on this topic.
]]>Certainly a player’s Corsi or Corsi For % can be a reflection of their team and their teammates (see Rob Scuderi with LA or Dennis Seidenberg with Boston). However, the point of CorsiRel is to differentiate between how the team does when the player is on and off the ice. This is a form of without and with you or WOWY as it is commonly known.
In the case of THoR, we have explicit terms in our model to account for the other players on the ice with a given player as well as rink effect. Thus, we have a stronger form of WOWY built in that simultaneously is adjusting for the impact of other players on the ice for both teams during a given event. THoR also accounts for a variety of other factors including score effects, home ice, zone starts, etc. Details can be found here.
To investigate this particular question, I’m going to look at how year to year correlations change for players who are on the same team both years compared to players on different teams for both years. If the correlation or the ability to predict how a player does in the future is substantially decreased by changing teams, then that metric is not isolating individual player talent. Thus, our cohort of players will be those that finished consecutive seasons with different teams but finished the preceding or subsequent season with the same team. We won’t limit ourselves to players that spent an entire season with a different club since that would drive down some already smaller sample sizes.
In statistical data collection, there is an idea that an individual/subject is there own best control. This comes into play when testing, for example, a drug vs a placebo. No matter how well you set up your study there is always a worry that the groups were different at the start, in which case conclusions at the end are suspect. To deal with that here I’m going to look at the same group of individuals over multiple years when they have switched teams and when they have not. The drawback to this sort of analysis is that there could be changes in the players themselves from year to year. This will be particularly acute for players at the end of their careers when their abilities are likely declining (Steve Sullivan, Jerome Iginla, Jaromir Jagr).
THoR
For the data here I am using the latest version of THoR (all events) and taking players with a minimum of 2000 plays in each seasons considered. Here, in total, we used the following seasons: 0910, 1011, 1112, and 1213. (Unfortunately, we don’t have THoR further back than the 0910 season.) Each row of the table represents a cohort of players. The first row, for example, deals with players who finished the season with the same team in 0910 and 1011 but finished 1112 on a different team. Vice versa for the 56 players involved in the results for the second row. Our sample sizes here are not large so a decent amount of variation between the correlations will be expected. It is unexpected that we would see a higher correlations for players when they change teams. Certainly some of those differences are less than 0.01 so it is safe to say that they are likely just noise; however, they do also mean that there is very little change in the performance of THoR when players change teams. On average here that difference is less than half of one percent. Adding a standard error to that you get about 4% as a reasonable amount of drop in correlation due to players changing teams.
Seasons with same team | Seasons with different teams | Sample size | Correlation across seasons with same team | Correlation across seasons with different teams |
0910/1011 | 1011/1112 | 61 | 0.816 | 0.726 |
1011/1112 | 0910/1011 | 56 | 0.821 | 0.833 |
1011/1112 | 1112/1213 | 36 | 0.852 | 0.856 |
1112/1213 | 1011/1112 | 31 | 0.732 | 0.786 |
CorsiRel
We next turn to results for CorsiRel. The players here had to have at least 30 games played in each of the three seasons considered. We did the same analysis as we did for THoR using data from CorsiRel (downloaded from behindthenet.ca). Originally, we had the same years involved for CorsiRel as we did for THoR. However, those results were highly variable and so I added some additional years to give a bigger picture of what was going one. I pulled the CorsiRel for the 0708 and 0809 seasons with the same criteria for . The average change in correlations by cohort is 10% with about an 8% standard error. Adding and subtracting one standard error to that mean it’s reasonable to say that the impact of changing teams is a difference of between 0 and 18%.
Seasons with same team | Seasons with different teams | Sample size | Correlation across seasons with same team | Correlation across seasons with different teams |
0708/0809 | 0809/0910 | 85 | 0.686 | 0.647 |
0809/0910 | 0708/0809 | 70 | 0.577 | 0.529 |
0809/0910 | 0910/1011 | 81 | 0.569 | 0.422 |
0910/1011 | 0809/0910 | 72 | 0.668 | 0.639 |
0910/1011 | 1011/1112 | 86 | 0.578 | 0.500 |
1011/1112 | 0910/1011 | 58 | 0.712 | 0.316 |
1011/1112 | 1112/1213 | 58 | 0.494 | 0.576 |
1112/1213 | 1011/1112 | 57 | 0.446 | 0.536 |
Comments
From the above, we can see that for the most part CorsiRel and THoR (all events) are primarily player based metrics rather than team based metrics. It’s reasonable to expect that there will be about a 10% drop in correlation for players that switch teams for CorsiRel, while for THoR that number is less than one-half of one percent. There is certainly variability in each of these estimates. We’ve used a methodology here that has the same individuals across years when they were with the same team and years when they were with different teams.
As mentioned above, this methodology has some drawbacks, but it eliminates many of the other factors that would impact a similar study done with groups of different players. In particular, it is not wholly reasonable to expect players to have the same performance from year to year (even when they are with the same team). Players are not static. Of primary concern here is they age and mature. One thing worth noting here is that when players changed teams at a chronologically later time period, the change in correlation were roughly the same as they were when the change of teams occurred prior. That is good news but does not completely rule out In addition to aging, players get hurt but play through it. Coaches use them in different ways. There are lots of sources of variability in their ratings other than changing teams. Additionally, in the group of players with different teams are those who began the season with different teams and those that were traded during the season. A more extensive breakdown, though it would have a smaller sample size, would provide some additional information. While not perfect, this approach does give us an improved idea of how dependent these metrics are on players staying with the same team.
In the end, both THoR and CorsiRel are both clearly player metrics rather than team metrics. While THoR (all events) is a metric that is almost completely unaffected by players changing teams, we can say that about 90% of the correlation in CorsiRel does not depend upon a player’s team.
Suffice it to say that Diaz is likely to retain his value (+2 wins above replacement) with the Canucks.
]]>We focused on players whose names have been considered as possible participants from articles like this and this. [Note that since this was first written a couple other articles have appeared: here, here and here.
Below, we’ll go position by position. We’ll start with centers.
CENTERS
Here’s the summary table for ten centers mentioned as possible centers for Team USA.
Name | THoR Rank | Corsi Rank |
Paul Stastny | 1 | 6 |
Joe Pavelski | 2 | 1 |
Ryan Kesler | 3 | 2 |
Craig Smith | 4 | 8 |
Kyle Palmieri | 5 | 9 |
Derek Stepan | 7 | 3 |
Alex Galchenyuk | 8 | 4 |
Trevor Lewis | 9 | 10 |
David Backes | 10 | 5 |
Nick Bjugstad | - | 7 |
Plenty of folks think this is not a strength for the US. They’re not as deep as Canada for sure but the top three are high quality. Weighting the two metrics evenly, gives Pavelski, Kesler and Stastny with Stepan and Smith as fourth and fifth choices. Corsi likes Stepan and Galchenyuk over Stastny while THoR likes Stastny with Smith as the fourth choice. THoR does not have enough data on Nick Bjugstad to rank him but it seems unlikely that is going to be part of the final consideration here.
WINGS
Here’s the summary table for wings.
Name | THoR Rank | Corsi Rank |
Zach Parise | 1 | 1 |
Max Pacioretty | 2 | 2 |
Phil Kessel | 3 | 8 |
Blake Wheeler | 4 | 5 |
Dustin Brown | 5 | 4 |
Patrick Kane | 6 | 10 |
T. J. Oshie | 7 | 13 |
Kyle Okposo | 8 | 7 |
Brandon Saad | 9 | 3 |
Bobby Ryan | 10 | 6 |
Ryan Callahan | 11 | 12 |
James Van Riemsdyk | 12 | 9 |
Justin Abdelkader | 13 | 11 |
Next up is wings. There is pretty good agreement here in the rank of the players that are considered for this list between THoR and Corsi. Saad and Kane are the biggest difference between the list and both would end up on my roster. Parise, Pacioretty, Wheeler, Brown, Kessel, Saad, Okposo, Ryan, and Kane should go based upon this list assuming taking nine forwards. I know that Pacioretty is a borderline player for many folks but both of these metric think highly of him. Kane being as low as he is a bit of a surprise on the Corsi side but he still makes the roster. Bennett has a strong future ahead of him and THoR has some strong inklings about him but there’s not enough data there to be conclusive.
DEFENSEMEN
Next, we’ll turn to defensemen. Here is the table for them with ranks for both metrics based on . As with Bjugstad and Bennett, we don’t have enough data for THoR to be confident but we do have CorsiRel so it is included. Neither player has done well enough on that metric to be considered.
Name | THoR Rank | Corsi Rank |
Dustin Byfuglien | 1 | 1 |
Keith Yandle | 2 | 2 |
Alec Martinez | 3 | 3 |
Alex Goligoski | 4 | 4 |
Cam Fowler | 5 | 8 |
Nick Leddy | 6 | 11 |
Kevin Shattenkirk | 7 | 5 |
John Carlson | 8 | 13 |
Justin Faulk | 9 | 14 |
Ryan McDonagh | 10 | 9 |
Jack Johnson | 11 | 18 |
Erik Johnson | 12 | 6 |
Paul Martin | 13 | 12 |
Ryan Suter | 14 | 10 |
Zach Bogosian | 15 | 16 |
Jake Gardiner | 16 | 7 |
Brooks Orpik | 17 | 17 |
Seth Jones | - | 15 |
Jacob Trouba | - | - |
There’s less agreement between the two measures here than there was for centers and wings though it is not too bad. However, there is enough agreement that it would seem like some choices are obvious. I’ve no doubt that Ryan Suter is going to Sochi despite where he is ranked by THoR and by Corsi. And I hope, hope, hope that Dustin Byfuglien is going along with Yandle, Fowler, Shattenkirk, Leddy, and Erik Johnson. Byfuglien has the added versatility of being able to play forward if needed. I understand there is some concern about him on ‘the big ice’ but I think there’s enough analysis to suggest that he helps this team. Anote about Martinez and Goligoski since they don’t seem to make most writers lists. Goligoski seems on the periphery of consideration, while Martinez seems to be off the radar. Goligoski has dropped his production somewhat this past year at least as far as CorsiRel is concerned but he still outranks most of this group. I want them on my team. Trouba and Jones don’t have enough time in the NHL to get a good bead on their performance from THoR’s perspective so we won’t consider them and their CorsiRel’s aren’t that impressive anyway. An argument can be made for Suter on leadership along. I get that but here our focus in analytics and what that would suggest for this team. I know he is hurt — hate to kick someone when they are down — but please, oh, please in the name of Jim Craig and Mike Eruzione, don’t take Brooks Orpik. Bad things happen when he is on the ice. Paul Martin I could live with, Ryan McDonagh I could live with but not Orpik.
There are some righty/lefty issues here as the top of our list. Yandle, Fowler, Goligoski, Martinez and Leddy are all lefties. Shattenkirk and Byfuglien are righties as is Erik Johnson. If we want righty/lefty pairings then we need to drop Leddy for John Carlson. In the NHL.com article linked above David Poile references a conversation with Brian Rafalski suggesting that quality players are more important than lefty/right pairings.
GOALIES
Quality analytics on goalies are few and far between. Best results seem to be that best predictor of future performance is career save percentage. Though that should be tempered by age which is another factor here. Here’s a summary of the possibilities for the US in net in Sochi based on data through 15 December.
Name | Career SV% | Current SV% | Age |
Cory Schneider | 0.926 | 0.917 | 27 |
Tim Thomas | 0.921 | 0.909 | 39 |
Ben Bishop | 0.920 | 0.933 | 27 |
Jimmy Howard | 0.917 | 0.910 | 29 |
Ryan Miller | 0.915 | 0.921 | 33 |
Jonathan Quick | 0.914 | 0.905 | 28 |
Craig Anderson | 0.914 | 0.898 | 32 |
Okay so Schneider is a no brainer at this point to my mind. I think he has to be the number 1. The sample size is large enough for the Career SV% to be very impressive. After Schneider, the surest bet here is Miller and so I would take him. Miller’s experience is also worth considering here, and then, probably, Jimmy Howard. The combination of Howard being fourth on this list in career SV% with a large sample size and having the fourth best SV% at the moment is enough to bump Quick and Bishop off the list. Tim Thomas’ age is too big a factor for me. Old goalies don’t regress to the mean. I can see a good argument for Ben Bishop especially given how he is playing as well as one for Jonathan Quick. The groin injury that Quick has is also a concern enough for me to take Howard in his place.
CHOICES
Okay so we’ve looked at the US roster from a statistical vantage point. Using data from the last three NHL seasons, we approached the problem of player selection using two metrics: CorsiRel and THoR. This version of THoR is the one that uses data from special teams as well as even strength. Based upon this, I would take the following 23 players. I have got 12 forward and Byfuglien and 7 defensemen and Byfuglien, and 3 goalies. From the point of view of these analytics, . these are the selections.
Centers (3) | Pavelski, Kesler, Stastny |
Wings (9) | Parise, Pacioretty, Wheeler, Brown, Kessel, Saad, Okposo, Ryan, Kane |
Byfuglien(1) | Dustin |
Defensemen (7) | Yandle, Martinez, Goligoski, Shattenkirk, Fowler, Leddy, Erik Johnson |
Goalies (3) | Schneider, Miller, Howard |
(NOTE: An earlier version of this post had David Backes as a wing and T. J. Oshie as a center.)
]]>
FORWARDS
Name | Team | Position | Count | WAR |
JORDAN EBERLE | EDMONTON OILERS | C | 4405 | 6.60 |
ALEX OVECHKIN | WASHINGTON CAPITALS | L | 4606 | 6.38 |
PATRICE BERGERON | BOSTON BRUINS | C | 4914 | 3.95 |
CHRIS KUNITZ | PITTSBURGH PENGUINS | L | 3844 | 3.71 |
PATRIK ELIAS | NEW JERSEY DEVILS | L | 3451 | 3.62 |
HENRIK ZETTERBERG | DETROIT RED WINGS | L | 4380 | 3.58 |
MAX PACIORETTY | MONTREAL CANADIENS | L | 3403 | 3.52 |
LOGAN COUTURE | SAN JOSE SHARKS | C | 5010 | 3.48 |
BRIAN GIONTA | MONTREAL CANADIENS | R | 3841 | 3.34 |
JAKUB VORACEK | PHILADELPHIA FLYERS | R | 3680 | 3.17 |
First comment is that that are a lot of really good players on this list. To be on this list a player had to have both a high per play rating and also be a player that was on the ice for a large number of plays. We looked only at players with at least 2000 plays. Kunitz and Voracek seem high to me. Kunitz because he plays so often with Crosby and Voracek because I don’t think of him as a dominant player though his even strength CorsiRel numbers are quite high for both of the seasons on which this analysis is based. However, there is no doubt that Kunitz is a really good player. Because they play so much together THoR has trouble distinguishing Crosby and Kunitz. Ovechkin is interested in hockey again so that is nice to see. Bergeron, Elias, Zetterberg, Pacioretty, Couture, no surprises. Gionta helped his team by drawing lots of penalties in 12-13 so that helps him here. Just off the list D. Sedin, Brad Richards, Anze Kopitar.
Next we’ll look at defensemen.
DEFENSEMEN
Name | Team | Position | Count | WAR |
DUSTIN BYFUGLIEN | WINNIPEG JETS | D | 5249 | 5.24 |
P.K. SUBBAN | MONTREAL CANADIENS | D | 4886 | 4.75 |
ANDREI MARKOV | MONTREAL CANADIENS | D | 5417 | 4.28 |
DUNCAN KEITH | CHICAGO BLACKHAWKS | D | 4979 | 4.16 |
CHRISTIAN EHRHOFF | BUFFALO SABRES | D | 4637 | 3.96 |
DAN HAMHUIS | VANCOUVER CANUCKS | D | 4611 | 3.85 |
KIMMO TIMONEN | PHILADELPHIA FLYERS | D | 4174 | 3.83 |
DION PHANEUF | TORONTO MAPLE LEAFS | D | 5850 | 3.81 |
KRIS LETANG | PITTSBURGH PENGUINS | D | 3743 | 3.69 |
MARK STREIT | PHILADELPHIA FLYERS | D | 4612 | 3.58 |
Overall, this list is strong. With more minutes Erik Karlsson is in the top 3 of this list. Letang also is higher with more minutes. The model likes both Subban and Markov though per play it favors Subban. This year they are playing together a good bit but last year which is currently about 60% of the current data they were together less than 20% of the time. Byfuglien is high but primarily on the strength of what he does on the power play. Phaneuf makes the list based upon events on the ice though he is good just not top 10 on per play basis. Just missing the top 10 are Oduya, Kindl, and Visnovsky.
Very High Predictability
One of the great strengths of this new THoR is that it has very high predictability. The table below summarizes this aspect of our metric. The average one year to the next correlation in player ratings is about 0.81. This is compared to a 0.65 for THoR even strength and about 0.5 for CorsiRel. There is little drop in the correlation as we move out from two to three years. This suggests that we are capturing a consistent value for a given player. Similarly the slopes which are also indicative of regression to the mean here maintain their proximity to 1. GVT which is another metric that uses data from all events has consistent year to year correlations of approximately 0.6.
Season | 1011 | 1112 | 1213 | ||
0910 |
corr=0.834 n=415 slope=0.811 |
0.779 376 0.769 |
0.797 225 0.869 |
||
1011 |
corr=0.792 n=439 slope=0.794 |
0.788 n=261 0.884 |
|||
1112 |
corr=0.808 n=285 slope=0.897 |
To summarize, we have a new model that is based upon the probability that all on-ice action events from the NHL play-by-play system lead to a goal. Our model accounts for things like zone starts, quality of competition, quality of teammates, score effects, as well as even having factors for strength versus special teams, we can isolate the value of an individual player. Those effects are highly consistent from year to year.
]]>
Part I (LINK) of that case shows that there is high year to year correlation between THoR Values meaning that THoR values are more likely to predict future THoR values. THoR has a year to year correlation of about 0.65 which is considerably higher than CorsiRel.
Part II (LINK) furthers the exploration of THoR by looking at how well THoR is associated with winning a game in the NHL. The response that THoR uses is in the neighborhood of Corsi and Fenwick in this regard though it is slightly less than both.
Part III (LINK) looks at the yearly THoR values for some specific players to give a sense for how THoR performs. There were certainly some surprises and they have received a good deal of attention but in reality the top THoR players ( Bergeron, Parise, Toews, Kopitar, etc) are generally not surprises.
]]>
We’ll start with Tyler Kennedy. The original THoR paper had him as a top player for 2010-12. He’s generally been a third liner with Pittsburgh until this year. He was traded to San Jose prior to the NHL draft for a 2nd round pick. The line for Kennedy over the four years for which we have THoR is: 0.8, 2.2, 4.0, 2.1 wins above replacement per year, where we’re using half a win as the difference between average and replacement. Those four numbers are the even strength wins above replacement for the 2009-10, 2010-11, 2011-12 and 2012-13 NHL seasons. Obviously the 2012-13 season is based upon less data than the others. Note that even strength is not the whole picture of player value but it is one that gives us some consistently. In calculating these values we assuming the same number, 4000, of plays for each player each year. The performance here for Kennedy has been relatively consistent and top end since the 2009-10 season. An high end THoR player can account for 3 or 4 wins above replacement at even strength using this methodology. He showed up highly on our radar for the middle two years. The most he played was in 2010-11 when he had just under 4000 plays. Can’t say it is a surprise that his play in 2012-13 regressed. We expect some year to year regression though THoR does have high year to year correlation. The Penguins were obviously disappointed with him making him a healthy scratch for good portions of the NHL playoffs. As with most metrics, we should expect the Kennedy going forward will produce something like the average of his THoR’s to date.
Next up is Alexander Steen. We had him very high and recently a small sample survey of NHL players by ESPN found him to be the most underrated player in the NHL. THoR has him at 1.3, 3.1, 4.5, 0.9 wins above replacement for each of the last four seasons. Steen has had some injuries and so like Kennedy has never played 4000 plays in a single season. Now one might suggest that the numbers for these two players might be a result of some smaller sample sizes but in the model we have shrinkage due to the ridge regression that we are using which deflates ratings of players with smaller sample sizes. So that’s not it. Steen has gotten a good deal of attention of late since he has started the season at a torrid scoring rate. THoR is not affected by this as we give the same value to a shot, the probability that it goes in, whether it is a goal or not. There is definitely finishing ability at the NHL level but it is hard to determine that from a single years worth of data and so we’ll continue to look at finishing but for now the model does not include it.
Now we have so far looked at players that were surprising from the original THoR paper. And, no doubt, those need to be assessed. But to paraphrase something I’ve seen attributed to Bill James any new metric that is worthwhile should be mostly things we already know but also some surprises. Steen is no longer a surprise; Kennedy does well at getting pucks to the net. If we look at some of the other top players from from the THoR we can see the year to year consistency of this metric. From A. Kopitar (1.6, 2.3, 3.2, 3.5) to H. Zetterburg (2.8, 2.3, 2.5, 2.3) to Z Parise (5.1, NA, 4.8, 4.0) to D. Sedin (3.2, 4.1, 3.7, 4.3) to J. Toews (2.5, 2.7, 3.3, 3.8), there is good consistency with who some of the elite players in the NHL are. The NA for Parise in the ’10-’11 season was because he did not appear in enough games (due to injury) to have a rating.
The next player we’ll consider is Rob Scuderi. (Sorry to kick a guy when he is injured.) Some Penguins folks weren’t happy when I said that his signing stood out this past fall in an interview with a Boston newspaper. Scuderi will make an AAV of $3.375MM per year for four years which means he ought to be worth a couple of wins per year. Scuderi’s THoR values come in at 0.5, -0.8, -0.1, 0.0. Those negative values means he is doing below replacement level performance according to THoR. As some critics have pointed out, Scuderi had positive Corsi values while with the Kings but if you account for who he was playing with (a very high possession Kings team) as well as who he played against, Scuderi was not adding to the bottom line. Hence his THoR numbers. Of course, he had better numbers than Douglas Murray who the Pens also acquired. Murray’s numbers were: -0.1, -0.5, -3.4, -2.6. Maybe pairing Scuderi with Orpik (-0.7, -0.7, -1.7, -2.,8) will improve them both. I guess regression to the mean beckons. Meanwhile Pittsburgh seems to have demoted Kris Letang (3.4, 3.0, 1.7, 3.9).
As I was preparing this series on THoR, Scott Cullen had a twitter discussion about the value of Robyn Regehr.
@MacSapintosh That’s evident in his Corsi alone, but they play him against toughest opponents and he gets most D-zone starts of Kings’ D.
— Scott Cullen (@tsnscottcullen) November 17, 2013
That discussion caught my eye. Essentially Cullen was saying that it is hard to judge Regehr since he is playing the toughest ice time and often starting in his own zone. Now we know that both of those things (zone starts and quality of opponents) matter for evaluating a player. One of the great things about THoR is that we can use our methodology to account for just those factors (and the others in the model). If we do that we get the following for Regehr: -1.2, 0.9, 0.4, -1.2. Not exactly lighting the world on fire.
So we’ve looked at some individual players using even strength THoR. THoR ratings are going to vary from year to year but there is relatively high correlation in these player ratings. For more details see Part I of this series. In mid-December we’ll have enough data from the current season and the previous one to release the latest THoR. Meanwhile we’ll keep hammering away.
]]>
Before we get to the validity part, I wanted to share a couple of other results that we have now run. I had initially resisted doing a even-odd sort of out of sample correlation because our code for calculating THoR is in python (and still it takes about 10+ hour to run a years worth of data) and it would have been a nightmare to recode all that for a separate analysis. My co-author Jim Curro wrote the python code from some original R code. However, recently, I thought of a brilliant shortcut using some of the output we get from the python code. Only took a couple of months to figure that one out but so be it. Anyway, I was able to get it running. I took the 2010-11 and 2011-12 seasons and took every other play even strength play. Given the collinearity among linemates (linemates that play together often), we prefer to use one year of data as a minimum. For players who were on the ice for a minimum of 1000 plays, the correlation was 0.686. For players who were on the ice for a minimum of 2000 plays, the correlation was 0.724. For 2009-10 and 2010-11, the same alternating play analysis gave 0.681 and 0.708, respectively. Combining these statistics with our year to year correlations in Part I, it seems what THoR measures is a repeatable skill.
Speaking of repeatability, here are the coefficients for the NP20 score effect, for home ice and for a zone start by season using just data from an individual year. The values are tiny because they are the per play values. However, these should give you a sense of the consistency of some of the effects in our model. Over these four seasons the NHL has been pretty consistent.
Table 1: Year to Year estimates of effects
Season |
Score Effect |
Home Ice |
Zone Start |
09-10 |
-0.0008 |
0.0012 |
0.0056 |
10-11 |
-0.0004 |
0.0012 |
0.0057 |
11-12 |
-0.0012 |
0.0012 |
0.0051 |
12-13 |
-0.0006 |
0.0008 |
0.0055 |
Now we still have this question of whether or not THoR is measuring something that is related to winning. Recall that the THoR model is based upon a ridge regression that includes players but also the factors: home ice, rink, quality of teammates(QoT), quality of opponents(QoC), zone starts(ZS) and score effects. All of those are information we have about each play at even strength. Our outcome (or response) metric is NP20. NP20 is the net probability that a given event leads to a goal for the home team minus the probability that the event leads to a goal for the away team. We treat shots and goals as shots and calculate the probability that a shot was a goal plus the probability that it resulted in a goal in the subsequent 20 seconds. Hence the 20 in NP20. (We used 10 seconds when we started this process in 2008 but we now use 20 to be conservative to value of a play.) There are some other nice features to THoR. Have a look at the paper here for more details. We’ve also added some additional aspects to the model since the original paper. The THoR page has the latest on these. Player values for the home and away teams are entered as positive and negative, respectively, to agree with the NP20. Thus the THoR player rating is one that tries to extract the impact of an individual player over the course of the events recorded while they are on the ice.
We should note that we are using a proxy for validity here. We don’t have a ground truth of player value. No one does. Instead, what we have is whether or not a given measure can predict the winning of a hockey game. That’s good but not perfect. Recall that THoR is based on probabilities of events leading to goals and we know that goal differential is highly correlated with Points at the team level. So there is some inherent validity built in. To look at the validity of THoR through the lens of winning individual games, I summed NP20, which is the response for THoR, over the course of the first three periods of each game from 2009 to 2013. Limiting the analysis to games that ended in regulation, I tracked whether or not NP20 was a predictor of the winner. I threw out games that went to OT or a shootout so what we have here is only games decided in regulation. From the NP20 perspective, a team with a positive NP20 should be probabilistically outperforming the other team. I also did this for the same seasons for Corsi and Fenwick. The data for Corsi and Fenwick was pulled using the invaluable nhlscrapr package in R from Andrew Thomas and Sam Ventura. The results are below in Table 2. The proportions in the table below are the percentage of times that winning the XX battle led to the appropriate prediction of who won the game where XX is the given metric. That is, teams that out Corsi’d their opponents 5v5 within 1 goal won 59.1% of games.
Table 2: Comparing the Probability of Winning of Shots, Corsi, Fenwick and NP20
Conditions |
NP20(THoR) |
Corsi |
Fenwick |
Shots |
5v5 |
0.519 |
0.530 |
0.461 |
0.406 |
5v5 within 2 |
0.537 |
0.568 |
0.520 |
0.452 |
5v5 within 1 |
0.573 |
0.591 |
0.580 |
0.493 |
5v5 tied |
0.589 |
0.607 |
0.620 |
0.538 |
So each of these methods improves it’s predictability as we limit the conditions. At 5v5 tied all do their best. Score effects are clear from this. Fenwick at 5v5 tied is the best at 62%. Interestingly enough this tweet comes along recently from Josh Weissbock.
Holy crap… I’ve managed to increase my single game accuracy predictions using #machinelearning up to 61.5%
— GritHeartTruculence (@joshweissbock) November 21, 2013
So none of these methods is too far from that value though shots even 5v5 tied is well below the others. THoR’s NP20 is below Corsi and Fenwick but not by much and probably only statistically significantly below Fenwick. We, of course, have to recognize that there is some noise among these estimates that is not sampling. But at 5v5 tied Corsi and Fenwick beat NP20 if not signficantly in the case of the former. The point of this exercise was to look at the validity of THoR. Pretty clearly THoR is a valid metric, that is one that is related to winning. Additionally, the things in THoR that are not in Corsi — besides the ones in Table I — include things that are for the most part repeatable including faceoff wins and penalties. So there are good reasons to have them there for assessing player value. THoR is not the end all and be all and we’ll keep tweaking it. But in light of these analyses it is a metric that is useful for assessing NHL player value. Ultimately THoR gives about 20% more reliability while being a little lower on validity.
One logical next step is to consider a THoR-like model with a Fenwick or Corsi response. Or use some other similar response metric. We have begun looking into this and our preliminary findings are underwhelming. In order to get the year to year correlations to around 0.4, we have to shrink a very large amount. None of the range of models we have looked at so far yields year to year correlations close to the approximately 0.65 values for THoR. This highlights another advantage of THoR. Using every play gives large sample sizes after fewer games than Corsi or Fenwick. That is only an advantage if that data are useful and clearly in terms of evaluating player reliability using THoR they are.
In the next part of this series, we’ll look at some specific players through the lens of THoR.
]]>
Each of these websites uses different input data for their calculations and determines amount of play differently. For HART, I used the 5 on 5 Close data for players who had more than 200 minutes in a given season. For CorsiRel, I used 5v5 data for players who played in more than 30 games. For THoR, I used players who were on the ice for more than 1000 plays (either 5v5 or 4v4) as extracted from the NHL.com play-by-play files. (By the way, there is a great way to extract NHL play-by-play data now in R via the nhlscrapr package provided by Andrew Thomas and Sam Ventura. ) These particular cutoffs were used to ensure that all of the methods had similar numbers of players when making the year to year comparisons.
A quick note about this proposed analysis. The methodology here is prone to yield regression to the mean. That is to be expected. The correlation, r, between 09-10 ratings and 10-11 ratings should be more than the correlation between 09-10 and 11-12. Lots of reasons for this but regression to the mean is foremost among these. Other effects such as aging will also factor into this. The other metric that I looked at was the estimated slope of the least squares regression, b, between the years. This also gives some idea of how much ‘regression to the mean’ is occurring. Slopes that are lower (and further from one) indicate more ‘regression to the mean.’ Ideally our metrics will have slope close to one if we are getting the same level of performance from one season to the next. This is unlikely since we know that whether it is goals going in or Corsi events, there is an element of luck and so we would expect that those who are lucky one year will be less lucky the next.
In this analysis we start by looking at the results for HART. They are given in the table below. In each cell the first number is the correlation in ratings between the two sets of seasons given. The second entry in each cell is n, the number of players that went into the correlation. In order to be considered players had to meet the criteria for a given metric in both seasons. The third entry in each cell is the slope, b. This will be the same for all of the tables that follow. For ratings in the 2009-10 and 2010-11 seasons there was a correlation of r=0.457 between them based upon n=489 players and the slope of the regression line was b=0.478. The last values suggests that a player with a HART value of X in 2009-10 would be predicted to have a value of 0.487X for 2010-11.
Table 1: Hockey Analysis Ratings Total (HART)
Seasons | 10-11 | 11-12 | 12-13 |
09-10 | r=0.457 n=489 b=0.478 | 0.363 448 0.368 | 0.244 347 0.349 |
10-11 | 0.489 519 0.452 | 0.394 404 0.527 | |
11-12 | 0.544 442 0.756 |
The next set of results is for Corsi Rel. As before we looked at ratings for one year and correlated them with ratings in other years. The further out we go from a given year the fewer players that will appear in both due to retirements, injuries, etc. Clearly we can see from this that Corsi is something that has a higher correlation than HART and that does not deteriorate much over time. The drop from one year correlations ~0.59 to two year correlation ~0.49 to three year correlation is not as much as we saw for HART. Additionally the slopes for HART are less than they are for CorsiRel.
Table 2: Results for Corsi Rel
Seasons | 10-11 | 11-12 | 12-13 |
09-10 | r=0.561 n=497 b=0.616 | 0.494 447 0.584 | 0.479 333 0.578 |
10-11 | 0.571 515 0.609 | 0.490 391 0.531 | |
11-12 | 0.616 434 0.656 |
Lastly, we come to THoR. The year to year correlations, given in the table below, are higher for THoR than they are for CorsiRel with the exception of the correlation from 2011-12 to 2012-13. THoR does particularly well at maintaining value past one year. Additionally, the slope for THoR for differences beyond one year is larger than for CorsiRel or for HART. THoR estimates regress to the mean less than Corsi Rel estimates which regress less than HART estimates. This is especially useful for prediction of future player performance.
I should note that this version of THoR is slightly different than the one that we presented at MIT Sloan Sports Analytics Conference last year and which took 2^{nd} place in the research paper competition. We have adjusted the amount of shrinkage here to improve these correlations. In the MIT SSAC paper we had an analysis of the correlation for players that switched teams and their performance but none of the ten people who have read the whole paper noticed it.
Table 3: Results for Total Hockey Ratings (THoR)
Seasons | 10-11 | 11-12 | 12-13 |
09-10 | r=0.663 n= 505 b=0.623 | 0.687 452 0.664 | 0.623 355 0.597 |
10-11 | 0.705 528 0.737 | 0.595 412 0.619 | |
11-12 | 0.620 457 0.616 |
To compare CorsiRel and THoR directly I made the following table (Table 4). Frankly both performed well. But it is pretty clear that THoR, on average, has a higher year to year correlation than CorsiRel and that THoR is more consistent over time. THoR pretty clearly beats Corsi by at least 0.1 in all but one of the correlations we considered. And the differences are larger for two and three year correlations.
Table 4: Comparing CorsiRel and THoR
Seasons | 10-11 | 11-12 | 12-13 |
09-10 | CorsiRel= 0.561 THoR= 0.663 | 0.494 0.687 | 0.479 0.623 |
10-11 | 0.571 0.705 | 0.490 0.595 | |
11-12 | 0.616 0.620 |
Before I finish, I just want to note that THoR is a different kind of model from these others. THoR accounts for a variety of factors that are not explicitly modeled in the other two ratings. The outcome metric for THoR is a value assigned to each play in the play by play files based upon the historical chance that it leads to a goal. THoR has build-in components to the model for dealing with home ice, score effects, rink effects, Quality of Teammates, Quality of Competition and Zone Starts. The details are on the THoR website and in the original MIT Sloan paper. Second, we still have some work to do to argue the utility of THoR but it does appear that it represents a more internally consistent metric over time. In a statistical sense it seems reliable. We still need to evaluate its validity. We’ll do that in Part II.