Last night I did a webinar on Statistical Methods for the Analysis of Hockey for St. Lawrence University Alumni and friends. I had a blast. There were lots of good questions and lots of interest. Exactly what I would expect from a bunch of Laurentians.

Below is a link to the slides for the talk.

]]>Additional Details can be found on the workshop website below:

Workshop website: HERE

]]>

Our aim is to build a model which accounts for the relative differences between rinks on the events that are recorded. The rest of this post is a summarized version of our paper on this topic which is linked below. **The focus of this work is in making data recorded for the following events comparable from rink to rink: Blocks, Giveaways, Hits, Missed Shots, Shots, and Takeaways.** We also look at the recording of aggregated events that count as Corsi events, Fenwick events and Turnovers. The last of these was created by Schuckers and Curro to account for the home bias of Takeaways and Giveaways as part of the THoR paper.

The data that we use for this analysis comes from the nhlscrapr R package created by Thomas and Ventura and includes 6858 games from six regular seasons.

We used a statistical regression to model counts of events per game with several predictors **including team factors, average score differential, and rink.** The estimated rink effects that we derive can be used to reweight recorded events so that can have comparable counts of events across rinks. Applying our methodology to data from six regular seasons (2007-08 through 2012-13), we find that for the most part NHL rinks (and the individuals therein) do a reasonably consistent job of recording events. This is especially true of the recording of SHOTs, which has the fewest rinks with significant recording issues and has the smallest rink effects that we found. We only found that St. Louis and Florida have rates for the recording of rinks that differ significantly from other rinks. Florida counts shots at a rate that is about 3% higher than other rinks while St. Louis counts shots at a rate that is 4.5% lower than the rest of the league.

However, there are some rinks with rink effects that are significant and consistent across these seasons for other events. Zeroing in on blocks, hits and misses, there were four rinks that consistently inflated or deflated the counts of those events: Edmonton, Los Angeles, New Jersey and Toronto. There were also six rinks that were not significantly different from the rest of the league on those events. These were: Buffalo, Nashville, Pittsburgh, St. Louis, Tampa Bay and Vancouver.

While event counts are impacted by rink effect, ratios of events such as Corsi For Percentage remain relatively unaffected. The table below demonstrates how small the impact of rink effects are on the Corsi For %. This despite our estimation that Boston, Columbus, Edmonton, Los Angeles, New Jersey, and Toronto all have significant rink effects for the counting of Corsi events. New Jersey is particularly egregious, undercounting Corsi events by about 16%. The reason that ratios are not impacted as much as counts is that the impact of a given rink is felt in both the numerator and the denominator for the home team.

**Table 1: Comparison of Corsi For % and Adjusted Corsi For % for 2012-13 NHL Regular Season**

Top 5 and Bottom 5 teams on Corsi For Pct.

Team | Corsi For Pct. | Adjusted Corsi For Pct. |

L.A. | 0.5630 | 0.5628 |

N.J. | 0.5592 | 0.5592 |

BOS | 0.5433 | 0.5430 |

CHI | 0.5414 | 0.5420 |

DET | 0.5366 | 0.5365 |

… | … | … |

CBJ | 0.4711 | 0.4706 |

NSH | 0.4668 | 0.4662 |

BUF | 0.4512 | 0.4513 |

EDM | 0.4458 | 0.4445 |

TOR | 0.4408 | 0.4398 |

While ratios are not affected in a major way by rink effects, the counts of individual events recorded as part of the NHL’s RTSS system are. To illustrate this we looked at the Block shots recorded by players during the 2012-13 NHL Regular season. For those rinks where there are rink effects for block shots, we take each block and weight it by 1 divided by the rink effect. So that for Nassau Coliseum home of the New York Islanders, NYI, each block counts as 1/1.208 or 0.828 of a block since that rink overcounts blocks by about 20.8%. Similarly, since the rink effect for Anaheim is 0.721, each block in that rink counts as 1/0.721 or 1.387 of a block.

We adjusted every block during the 2012-23 regular season for the rink in which it occurred and the top 10 players on adjusted blocks is given in the table below. Unlike the case for Corsi For %, there are substantial changes for the counts of block events. Here the top player on the revised list, Francois Beauchemin, was originally ranked 10th on the list of players with the most blocks. There are other large changes for players from the Islanders (Andrew MacDonald) and the Capitals (John Carlson) due to the relative counting of BLOCKs in those rinks. Macdonald and Carlson move from tied for third in raw BLOCKs to eighth and ninth, respectively, after our adjustment.

**Table 2: Comparison of Raw Block Counts and Adjusted Block Counts for 2012-13 NHL Regular Season**

Top 10 players based on Adjusted Block Counts

Player | Team | Adjusted Blocks | Raw Blocks | Differential |

F. Beauchemin | ANA | 133.3 | 111 | 22.3 |

G. Zanon | COL | 127.2 | 124 | 3.2 |

D Girardi | NYR | 120.7 | 125 | -4.3 |

R Hainsey | WPG | 120.5 | 123 | -2.5 |

D Seidenberg | BOS | 119.8 | 115 | 4.8 |

L Smid | EDM | 114.7 | 119 | -4.3 |

B Orpik | PIT | 110.3 | 114 | -3.7 |

A MacDonald | NYI | 109.8 | 123 | -13.2 |

J Carlson | WSH | 109.4 | 123 | -13.6 |

The full paper can be found HERE.

Note: Brian’s contributions to this project, with the exception of minor edits of the paper, were made while he was an Associate Professor in the Department of Mathematical Sciences at the United States Military Academy, West Point, NY, prior to joining the Florida Panthers.

]]>*Recently there has been a good deal of interest in the use of statistical methods and statistical thinking in sports. Work of this kind has been going on for many many years though often behind the scenes. With the publication of Michael Lewis’ Moneyball and the subsequent movie of the same name, the interest in this kind of work has grown extensively. Further evidence of this can be found in the increased number of conferences devoted to the topic and the increased use of these topics in the sports media. In this talk, I will begin with an overview of sports analytics including a discussion of the general approaches and methods. Having done that, I will discuss two examples from my own recent work: Rink effects in the NHL and NHL player ratings.*

Thanks to Shirley Mills at Carleton University for the invitation to speak and for lunch. Thanks to her students for some enlightening conversations. Was also nice to see some folks I know from the Canadian Border Services Agency in the audience.

Here are the slides from that talk.

]]>A week ago (on August 6th), I appeared on a panel on the Joint Statistical Meetings in Boston along with Brian Macdonald, Andrew Thomas, Sam Ventura and Kevin Mongeon. The panel was a good one on some of the advanced statistical methods that are being used in hockey. Fluto Shinzawa of the Boston Globe attended the panel and wrote up a piece on the panel that can be found at this link. A rough audio of this can be found at Andrew Thomas’ website: www.acthomas.ca/?p=62. Andrew was the organizer of the session.

I spoke for about 5 minutes on a model for Rink Effects that Brian Macdonald and I developed. Rink effects are something that has been known about in Hockey Analytics for some time. The issue is that there has not been a way to correct for the differences between the recording of RTSS at various rinks. This issue has been cited as one of the hurdles to adoption of hockey analytics. Here is an article with a quote from Chuck Fletcher of the Wild on data inconsistencies. Sportsnet.ca article Fletcher’s wrong about the recording of events in Detroit relative to Minnesota though Minnesota does undercount shots relative to the rest of the league. The bigger picture is that we need a way to adjust event counts. Thus, the goal of or paper is to develop tools to allow analysis to adjust RTSS events (Shots, Hits, Misses, Blocks) so that they are consistent from rink to rink. We’re not aiming for absolute truth (i.e. that what is counted as a hit is genuinely at hit) but rather for relative truth meaning that events are counted, on average, the same from rink to rink.

My slides from a short presentation at the JSM Panel are found below. We, Brian Macdonald and I, are working on finishing the final paper before I present the full set of results at the Royal Statistical Society’s International Conference in Sheffield, UK at the beginning of September.

(This article was edited 8/28/14 to add the link to the Sportsnet.ca article.)

]]>

I’ve updated the Total Hockey Ratings (THoR) for the 2013-14 season. Both the Even Strength and the All Events can be found on the THoR Page. Recall that THoR evaluates each play from the NHL’s RTSS system for it’s likelihood to lead to a goal in the subsequent 20 seconds. We then get a rating for players by accounting for their impact after adjusting for quality of teammates, quality of competition, zone starts, score effects, etc. The outcome metric here is Wins above replacement (WAR) relative to position. Details on the methodology can be found on the THoR Page. Below are some of the highlights from these results from the posted files which include the top 50 players.

**Even Strength**

Some interesting results here. I’ll start with EV first. The top ten here are A. Kopitar, M. Niskanen, M-E Vlasic, R. Suter, A. Sekera, J. Pavelski, J. Jagr, L. Couture, J. Hudler, T. Vanek.. Sekera and Hudler might be a bit of a surprise but none of the others would seem so. Suter has not always been so highly ranked by THoR but this year he is evaluated well. Couple of other notes on this. At even strength, THoR is a big fan of Gustav Nyquist of the Red Wings. And THoR also likes Tampa Bay’s Ondrej Palat as the best candidate for the Calder Trophy but he is.closely followed by Hampus Lindholm of the Ducks. The total impact of Palat and HIndholm is probably very similar but Palat was on the ice for fewer events. MacKinnon is not rated as highly. (Note that THoR does not take account of PDO as part of it’s evaluation of players so Nyquist’s high shooting percentage is not relevant to THoR but his high rate of shots per time on the ice is.)

In this analysis, we also find that the bottom five include: A. Ovechkin, A. Edler, A. MacDonald, N. Grossman, N. Yakupov. Again, these are not the worst rated players in the league but those that have cost their teams the most due to the amount of ice time they are getting in the THoR methodology.

**All Events**

The top ten evaluating players based upon all events is: R. Suter, C. Kunitz, J. Jagr, M.Niskanen, M. Giordano, A. Markov, T. Brouwer, J. Carlson, D Doughty and O Ekman-Larsson.

Seven defensemen are on that list and that is due to the fact that we are doing two things. One, we are normalizing average player value to the position and, two, we are taking that average relative value and multiplying by the number of plays for which the player was on the ice. Since top D tend to play more minutes, they are giving more value under this assessment. This, in particular, helps Suter who was on the ice for just about 25% more plays than Giordano. On a per play basis Giordano had a greater average impact but Suter had a bigger overall impact on the season. As with any statistical metric, we can derive variability in these estimates.

Hampus Lindholm, a defensemen for the Anaheim Ducks, is the highest rated, 11th, rookie on the list. He is not a Calder Trophy finalist. The finalists are Ondrej Palat, Tyler Johnson and Nathan MacKinnon, all forwards, who are rated 38th, 75th and 94th, respectively.

Again, we see some of the usual names at the top of the THoR list. Doughty, Steen, Hossa, Hornqvist, Couture, Karlsson, Kopitar and Subban. THoR is a two-way player metric and so this is not a surprise given the high year to year correlation for THoR.

Of note is that Ovechkin (not in the Top 250) compensates somewhat for his really poor even strength play with PP play that makes him a replacement level player for this year though

At the bottom of the ratings (not in the file for download), are A. Edler, R. Regehr, A. Ference, N. Nystrom and J. Cowen. Again, they were not the worst players but by virtue of their amount of ice time, they hurt their teams the most.

Also of note here is that Kunitz gets very high marks and higher marks than Crosby. This is a results of multicollinearity in these data with Crosby and Kunitz playing nearly 4/5th of their 5v5 ice time together (Source: www.behindthenet.ca). While THoR uses ridge regression to deal with this, it is currently optimized to provide high predictive reliability.

Alfredson, Jagr, Markov, Moulson, Niskanen, Timmonen, Vanek and Vrbata are all in the THoR Top 100 for 2013-14 and they are all UFA’s this summer. They should have plenty of good offers this offseason (except for Jagr who has resigned with NJD). That, along with a higher salary cap, should make for an interesting summer.

One interesting note for long-time hockey analytics followers is the appearance of Sean Couturier and Mark Scheifele in the top 250 players, though Couturier is worth over one win more than Scheifele this year based upon THoR. Also, David Perron of the Oilers, who was obtained in a trade from the Blues, shows up in the top 100.

Note that we use a different THoR model for Even Strength than for All Events that accounts for differences between 5v5, 5v4 and 5v3, among others, .

]]>

I love this idea that more NHL teams aren’t investing in analytics because the price tag reaches into the hundreds of thousands of dollars. This is a league in which a team still employs Colton Orr for close to a million dollars per season.

Maybe teams would rather not spend $250,000 to find out that they shouldn’t pay guys like him $925,000.

So this got me thinking about whether we could estimate the amount of return that spending $250k on analytics would yield. I decided to do a bit of back of the envelope computations to estimate what would be the benefit from having an analytics staff. No new analyses will be presented here, just some crude approximations. Almost all of this is low hanging fruit and no doubt there are improvements that can be made. The list is not comprehensive. I simply want to make the point that to say that hockey analytics have not proven their worth is ridiculous and I don’t need to be too thorough to do that.

I’ve broken down what follows into three categories: drafting, strategy and player acquisition. Within each category, I’ve estimated the return via goals, or points or wins. Following some previous work by Gabe Desjardins, I’m using that a win is worth 2 points and is worth roughly $2MM. Consequently, a point is worth $1MM. (Note the new CBA prompted me to round down slightly but that rounding should likely go up given what is expected for the salary cap next year.) We’re also going to look at things on a yearly basis so you might only get to sign a free agent goalie every fourth year so we’ll take 1/4 of the value from that in this assessment.

**Drafting**

The first area we’ll check is drafting. There are two aspects to this: evaluating players to draft and determining the value of draft picks. The former is a good bit more difficult to evaluate since it is harder to get a sense of how much can be gained in this regard. I also think analytics can help in this area both in predicting the future and in evaluating mistakes of the past.

Some recent work by Eric Tulsky is suggestive of the data that might be mined from leagues outside the NHL and might be useful for projection of player performance in the NHL. This work helps to determine how a player is being used by their current team and how difficult is the quality of opponents with whom they share the ice. Such supplemental information might yield additional information that would be useful. If a players is generating 25 goals in 75 games but is playing sheltered minutes then we may want to draft others who are producing . This work is relatively new and so it is harder to evaluate how they will perform. However, it is likely safe to say that such methods are worth a couple of goals a year. Let’s say 2. That’s a 1/3 of a win or about $670k.

One area of drafting that has seen a good deal of work is that idea of league equivalencies. The idea of these approaches is to look at the performance of a player who spent year 1 in a non-NHL league (e.g SM-Liiga in Finland) and year 2 in the NHL. By looking at the relationship between points in these two years, we can create a measure of the quality of the non-NHL league but also a method for estimation of production once a players has joined the NHL. The most famous example of the application of this sort of work comes from the decision by the Winnipeg Jets in 2011 to draft Mark Scheifele over Sean Couturier. The projections for Couturier were clearly higher than those for Scheifele as was noted at the time. Couturier has gone on to perform very well for the Flyers and is probably worth a couple of wins a year to them. Such circumstances are not going to crop up every year but when they do they are valuable. I’ll assume that such circumstances occur once every 4 years. Using Hockey-Reference.com’s Point Shares metric, the value of Couturier has been about 2.7 points over Scheifele over three years. That’s about 0.9 points per year. Or about 0.9/4 = 0.225 points or $225k per year.

So conservatively the total for drafting is about $895k per year from drafting. This excludes something like the creation and use of a value pick chart, for example this one or these.

**Strategy**

There is a good deal of evidence that pulling the goalie more often will yield some results. David Beaudoin and Tim Schwartz have done a nice study that suggests that teams should pull their goalies more often. This mathematical approach uses goal scoring rates under a variety of circumstances and uses these rates to estimate the average number of points under a variety of circumstances. They conclude by suggesting that on average a point per season can be gained with this approach. As with any strategic innovation, it’s success will lead to (being copied and eventually) its demise. Pulling the goalie faces many of the same difficulties as going for fourth downs in the NFL. It is less likely to be adapted because the costs. Nonetheless it is clear that analytics has suggested an improved way to play the game. Assuming such an advantage would be adopted by other teams we might lose our advantage after a year and so we’ll round this down to $100K in value. I do this here not because I don’t think it will work — I’ve seen first hand how well it works as my collaborator Chris Wells has led the NCAA in minutes with the goalie pulled for several consecutive years — but rather since I think it is less likely to be adopted by coaches.

Another strategic move that has been shown to have a strategic impact is carrying the puck more and dump-and-chase-ing less. Eric Tulsky has gotten a good deal of attention as the lead researcher in this area. His paper on this was a poster at last year’s MIT Sloan Sports Analytics Conference. The idea here is that carrying the puck into the offensive zone leads to better possessions which leads to more goals. It is hard to get a sense of the value here but let’s try. We can estimate about 50 zone entries at 5v5 per game (source here and here). If we can increase the number of shots using this method by 0.5 per game that would roughly be 3 goals per year or about $ 1MM. As above we spread out this gain over several years to get $250k per year.

So the number here suggest that it is possible to get $350k per year in value. Value from the analyses given here is the most likely to be transient but it is also possible that analysts will find new areas to exploit.

**Trades and Free Agency**

Next we’ll look at player acquisition through trades or free agency. One of the most useful aspects of analytics has been noted by Phil Birnbaum which is that they prevent stupid. As the article argues, there is more to be gained by not being stupid than there is by being smart. That is useful and a big part of analytics.

The litany of large NHL contracts that might have appropriately received another look if analytics are involved is long and includes: Bryzgalov’s signing by Philadelphia, Douglas Murray’s trade acquisition by Pittsburgh from San Jose, or his signing with Montreal, or trading for Robyn Regehr. Not every team signs or tries to sign a big free agent every year. Let’s say one every four years that overplays by $1MM per year. That’s a savings of $250k per year. And we haven’t even looked at the contracts of Pekka Rinne, or David Clarkson, or Brad Richards, or the aforementioned Colton Orr. As the salary cap grows this is likely to be an even bigger value.

**Summary**

There are lots of ways that hockey analytics can impact a team. I’m an analytics guy so I’ve tried to be conservative here about the value that analytics can bring. Overselling analytics won’t help it’s adoption. The cost of a reasonable analytics department might be around $300k. You could get a good deal of consulting input alone for $100k. I think it’s pretty easy to see that analytics could yield (at a minimum) benefits for an NHL team of over $1MM from the areas listed above. As I stated previously, these are estimates that are low and they are not exhaustive. Hopefully from this, it should pretty clear that cost is not one of the factors holding back analytics in the NHL.

*Addendum*

*My friend Steve Argeris suggested the following: I’d think about it a different way. What are you spending your money on? Presumably you can just have your cap guy or some $50k per year guy just stay up on what’s on the blogs and use the public websites reasonably well, so that’s really your baseline, not “nothing at all.” So say to hire a PhD in statistics (ahem) and move him from his tenure-track position to a more expensive city, subscribe to various data sets, and pay freelancers (analysts, coders, database guys) $200k per year. For that amount of money, what would it take every year to just break even?*

* It would take:*

*1) not blowing one draft pick in the first three rounds every three years (merely moving one pick in three years from “never playing an nhl game” to playing, say, 60 games at a minimum salary OR*

*2) finding a league minimum guy that can do what a $900k guy can do once per year OR*

*3) finding a $1M player who can do what a $2M player can do every three years OR*

*4) recognizing a tactical or strategic decision that improves a team by three marginal goals over three years ONCE *

* In other words, if ANY of those three things happen, you’ve paid for yourself. The likelihood of ANY of those things happening is pretty close to 100%. The likelihood of more substantial benefit is far far higher.*

]]>

So last week during the Hockey Analytics panel at the MIT Sloan Sports Analytics Conference, Eric Tulsky referenced a study that Michael Schuckers and his student Lauren Brozowski did on referees in the NHL. While this work has been available publicly on the conference website, due to some unknown oversight we did not post it here. So here it is. The paper is based upon two NHL seasons worth of data. The data don’t let us know who among the referees made the call just who was on the ice for the call. Most of the results are pretty obvious. The later you are in a tight game, the less likely it is that a penalty is called. Home teams are less likely to be called for penalties than visitors. There also seems to be a great deal of consistency among the referees in their rates of penalties after adjusting for a variety of factors including score, period, the teams involved, etc.

Link to slides from 2011 JSM Talk

Photo by Mark Canter, http://en.wikipedia.org/wiki/File:Dmitry_Kulikov_Panthers_Shane_Heyer.jpg

]]>**Player Usage Charts**

Rob Vollman created Player Usage Charts and they are helpful for getting a sense of how players are used and how they are performing. These charts do a nice job of getting multiple dimensions onto a single graphic and providing context for evaluation of players. The original work can be found here: Original Player Usage Charts and an updated interactive version with help from Robb Tufts is available at this link: Interactive Player Usage Charts.

**Extra Skater**

Darryl Metcalf at Extra Skater has put together an impressive set of visuals tools for analysis. Many of the tools come from other places and other people (Gabe Desjardins, Ben Wendorf, Rob Vollman) as Metcalf notes in his about link; however, having them in a single site presented well is very useful. I particularly like the individuals games summaries such as this Flames-Flyers summary and this Rangers-Pens summary.

**nhlscrapr**

Most folks doing heavy #fancystats at one time or another have had to scrape play by play data from nhl.com. The nhlscrapr package for the statistical software R from Andrew Thomas and Sam Ventura makes this process much, much easier by essentially downloading and parsing the data for you. While the statistical software R has a sleep learning curve, it is one of the most commonly used tools in analytics. One of the great advantages to R is that it is free to download and it has a community of people who are constantly making new packages and new code available. The nhlscrapr package can be downloaded at nhlscrapr site with the accompanying nhlscrapr reference manual.

**Zone Entries**

Eric Tulsky’s work with Geoffrey Detweiler, Robert Spencer, and Corey Sznajder was introduced to a wider audience at last year’s MIT Sloan Conference. In that paper, which can be found at this link and subsequent work they (and others) have shown the importance of carrying the puck through the neutral zone and into the offensive zone. The original work has now expanded to data collection on a good number of teams. There is now also a Zone Exit Project.

**Prospect Usage**

Eric Tulsky wrote an influential piece leading up to the 2013 NHL draft on evaluating how players in various leagues outside the NHL are used. This analysis which builds on previous work by Jonathan Willis and Scott Reynolds estimates the relative strength of opposing forwards and opposing defensemen. This methodology does this by estimating the TOI for these players using the number of goals for which they were on the ice for their team and adjusting for the team scoring rate.

**Macdonald’s Expected Goals Model**

Prediction of future hockey performance is not limited to Corsi and Fenwick. Brian addresses other metrics and ties those to the players on the ice during a given shift as well as other contextual information about that shift. Thus, this approach takes an über form of WOWY that accounts simultaneously for all of the other factors present during a given shift to isolate the impact of a given player. This work was originally presented at the 2012 MIT Sloan conference: link to the paper. A further explanation of this methodology was given by Macdonald in this post.

**Total Hockey Ratings**

My Total Hockey Ratings or (THoR), which was also originally present at MIT Sloan, is similar to the work of Macdonald above in that both models account for who is one the ice for and against for each action along with other context dependent factors. Under THoR, each event (hit, shot, miss, etc) in the NHL’s RTSS system is given a valued based upon the net probability it leads to a goal. The original model was purely based upon even strength events and introduced a methodology for adjusting shot (x,y) location to account for rink biases. The latest results and updates from THoR both even strength and the newer all events have shown a high level of reliability.

**dCorsi**

Steve Burtch’s dCorsi (or Delta Corsi) is similar to the previous two innovations in that it uses a statistical regression to account for factors. As the name implies, it is based upon Corsi and adjusting Corsi for average TOI, Zone Starts, QoT and QoC. The germ for dCorsi was a study on a metric called Shut Down Index which ‘morphed’ into the dCorsi which is discussed in this blog post looking at defencemen through the first quarter of a season.

**Parkatti’s Expected Goals**

So now not only do we have metrics in hockey with uninformative names — Corsi, Fenwick, PDO — but we now have metrics with the same name: expected goals. Michael Parkatti’s expected goals weights each shot by the probability of being a goal given distance and shot type. Michael was the winner of the Edmonton Oilers hackathon and is now involved with the team. Parkatti gave some idea of the predictive ability of this method in this post.

There have also been some promising other methods that have appreared which have a good deal of potential for creating innovation. I want to highlight three of these. The first is by Gramacy et al and received some interest from the Hockey Blogosphere. The methodology they have created is one with a good deal of potential. Similarly, the Mean Even Strength Hazard (MESH) rating approach of Andrew Thomas et al is a method that is worth reading. Finally, Josh Weissbock has developed some machine learning algorithms for the analysis of hockey. Here’s his academic conference paper. Weissbock also wrote this blog post on the topic.

As I mentioned above, hockey analytics has expanded a great deal in the past two years. I excited for this years MIT Sloan Sports Analytics Conference (#SSAC2014) Hockey Panel and look forward to the conference.

*Thanks to Brian Macdonald @greaterthanPM for discussion on this topic.*

Certainly a player’s Corsi or Corsi For % can be a reflection of their team and their teammates (see Rob Scuderi with LA or Dennis Seidenberg with Boston). However, the point of CorsiRel is to differentiate between how the team does when the player is on and off the ice. This is a form of without and with you or WOWY as it is commonly known.

In the case of THoR, we have explicit terms in our model to account for the other players on the ice with a given player as well as rink effect. Thus, we have a stronger form of WOWY built in that simultaneously is adjusting for the impact of other players on the ice for both teams during a given event. THoR also accounts for a variety of other factors including score effects, home ice, zone starts, etc. Details can be found here.

To investigate this particular question, I’m going to look at how year to year correlations change for players who are on the same team both years compared to players on different teams for both years. If the correlation or the ability to predict how a player does in the future is substantially decreased by changing teams, then that metric is not isolating individual player talent. Thus, our cohort of players will be those that finished consecutive seasons with different teams but finished the preceding or subsequent season with the same team. We won’t limit ourselves to players that spent an entire season with a different club since that would drive down some already smaller sample sizes.

In statistical data collection, there is an idea that an individual/subject is there own best control. This comes into play when testing, for example, a drug vs a placebo. No matter how well you set up your study there is always a worry that the groups were different at the start, in which case conclusions at the end are suspect. To deal with that here I’m going to look at the same group of individuals over multiple years when they have switched teams and when they have not. The drawback to this sort of analysis is that there could be changes in the players themselves from year to year. This will be particularly acute for players at the end of their careers when their abilities are likely declining (Steve Sullivan, Jerome Iginla, Jaromir Jagr).

**THoR**

For the data here I am using the latest version of THoR (all events) and taking players with a minimum of 2000 plays in each seasons considered. Here, in total, we used the following seasons: 0910, 1011, 1112, and 1213. (Unfortunately, we don’t have THoR further back than the 0910 season.) Each row of the table represents a cohort of players. The first row, for example, deals with players who finished the season with the same team in 0910 and 1011 but finished 1112 on a different team. Vice versa for the 56 players involved in the results for the second row. Our sample sizes here are not large so a decent amount of variation between the correlations will be expected. It is unexpected that we would see a higher correlations for players when they change teams. Certainly some of those differences are less than 0.01 so it is safe to say that they are likely just noise; however, they do also mean that there is very little change in the performance of THoR when players change teams. On average here that difference is less than half of one percent. Adding a standard error to that you get about 4% as a reasonable amount of drop in correlation due to players changing teams.

Seasons with same team | Seasons with different teams | Sample size | Correlation across seasons with same team | Correlation across seasons with different teams |

0910/1011 | 1011/1112 | 61 | 0.816 | 0.726 |

1011/1112 | 0910/1011 | 56 | 0.821 | 0.833 |

1011/1112 | 1112/1213 | 36 | 0.852 | 0.856 |

1112/1213 | 1011/1112 | 31 | 0.732 | 0.786 |

**CorsiRel**

We next turn to results for CorsiRel. The players here had to have at least 30 games played in each of the three seasons considered. We did the same analysis as we did for THoR using data from CorsiRel (downloaded from behindthenet.ca). Originally, we had the same years involved for CorsiRel as we did for THoR. However, those results were highly variable and so I added some additional years to give a bigger picture of what was going one. I pulled the CorsiRel for the 0708 and 0809 seasons with the same criteria for . The average change in correlations by cohort is 10% with about an 8% standard error. Adding and subtracting one standard error to that mean it’s reasonable to say that the impact of changing teams is a difference of between 0 and 18%.

Seasons with same team | Seasons with different teams | Sample size | Correlation across seasons with same team | Correlation across seasons with different teams |

0708/0809 | 0809/0910 | 85 | 0.686 | 0.647 |

0809/0910 | 0708/0809 | 70 | 0.577 | 0.529 |

0809/0910 | 0910/1011 | 81 | 0.569 | 0.422 |

0910/1011 | 0809/0910 | 72 | 0.668 | 0.639 |

0910/1011 | 1011/1112 | 86 | 0.578 | 0.500 |

1011/1112 | 0910/1011 | 58 | 0.712 | 0.316 |

1011/1112 | 1112/1213 | 58 | 0.494 | 0.576 |

1112/1213 | 1011/1112 | 57 | 0.446 | 0.536 |

**Comments**

From the above, we can see that for the most part CorsiRel and THoR (all events) are primarily player based metrics rather than team based metrics. It’s reasonable to expect that there will be about a 10% drop in correlation for players that switch teams for CorsiRel, while for THoR that number is less than one-half of one percent. There is certainly variability in each of these estimates. We’ve used a methodology here that has the same individuals across years when they were with the same team and years when they were with different teams.

As mentioned above, this methodology has some drawbacks, but it eliminates many of the other factors that would impact a similar study done with groups of different players. In particular, it is not wholly reasonable to expect players to have the same performance from year to year (even when they are with the same team). Players are not static. Of primary concern here is they age and mature. One thing worth noting here is that when players changed teams at a chronologically later time period, the change in correlation were roughly the same as they were when the change of teams occurred prior. That is good news but does not completely rule out In addition to aging, players get hurt but play through it. Coaches use them in different ways. There are lots of sources of variability in their ratings other than changing teams. Additionally, in the group of players with different teams are those who began the season with different teams and those that were traded during the season. A more extensive breakdown, though it would have a smaller sample size, would provide some additional information. While not perfect, this approach does give us an improved idea of how dependent these metrics are on players staying with the same team.

In the end, both THoR and CorsiRel are both clearly player metrics rather than team metrics. While THoR (all events) is a metric that is almost completely unaffected by players changing teams, we can say that about 90% of the correlation in CorsiRel does not depend upon a player’s team.

Suffice it to say that Diaz is likely to retain his value (+2 wins above replacement) with the Canucks.

]]>