Below is a paper that we (Timo Seppa, Michael Schuckers and Mike Rovito) recently wrote on NHL Draft Analytics entitled **Text Mining of Scouting Reports as a Novel Data Source for Improving NHL Draft Analytics. **

Here’s a link to the paper: TextMiningScoutingNHLDraftAnalyticsFeb2017

Timo will be presenting this work at 2017 Ottawa Hockey Analytics Conference #OTTHAC17 on May 6th.

]]>Info:

Based upon a smoothed monotonic regression (monreg package in R) using the first seven years (post draft) of time on ice (TOI) for players drafted at each pick in the NHL draft (through the 210th pick). The data used for this draft value pick chart (DPVC) are the 2003 to 2008 NHL Draft classes. The chart is scaled (and rounded) so that the first pick has value 1000. A graph of the chart is given below.

I’ve previously done two charts for assigning value to draft picks. The first and still most popular was done in 2011 and used games played as the outcome measure for players drafted from 1988 to 1997. That is here: 2011 DPVC (As always in these sorts of analyses we have to use historical data.) More recently in 2014 I put together a DPVC for an early version of a paper I published with Steve Argeris. While the focus of that paper was evaluating how teams draft relative to the NHL’s Central Scouting Service, we put together a DPVC based upon the first seven years of TOI for players drafted from 1998 to 2002 based upon the idea that teams generally have rights to a player for seven years after they are drafted. The DPVC did not make the final version of the paper which appeared in the Journal of Sports Analytics in 2015. Here’s the link to the original version of that paper with the DPVC: Schuckers and Argeris.

The 2016 version above uses the same methodology as the Schuckers and Argeris paper but using first seven years of TOI from the 2003 to 2008 NHL draft classes. The basic idea of a DPVC is to develop a value on a particular draft pick to have a starting point for possible trades. We have used a statistical method that forces the value of draft picks to be decreasing because that is what I believe inherently should be the structure of a DPVC. (Even though, for example, in these data the average TOI for the 9th pick is higher than the average of the 8th pick.) I just don’t think that you should value a pick which give you fewer choices to have higher value than one that gives you more choices. As I said whenever I’ve talked about this chart, it is a guide and you have to also consider other information you have about the draft and the players available.

David Wilson, a student at Carleton University, who I co-advised has noted that there is some evidence that players selected in the fifth round outperform those taken in the 4th round, at least for the drafts from 1998 to 2008. David presented some of this work at the Vancouver Hockey Analytics Conference. David is the second speaker in Session 5.

**Methodology Details:** I’ve used a small bandwidth both for density and kernel estimation in order to capture edge effects, 0.04. I went to small values of these parameters while also trying to maintain smoothness. This was also done to try to avoid edge effects on lower, i.e. earlier, draft picks. For fitting of these data, I used all of the rounds for 2003 to 2008 to avoid some edge effects at higher draft picks. That is, I used, for example, all 292 draft picks from the 2003 draft and all 230 picks from the 2005 draft. Here is the command I’ve used in R: `monreg(DraftPick,First7TOI,t=210,hr=0.04,hd=0.04,degree=1)`

**POSTSCRIPT (June 8, 2016): Note about differences from the 2011 DPVC to the 2016 DPVC**

I’ve gotten some questions about the above chart and in particular the differences between this chart and the 2011 version (here). The 2016 version is steeper than the 2011 version. There are several differences that could account for this. First, in the 2011 version I used career games played (GP) as the response while in the 2016 version I’ve used TOI in the first 7 seasons. So it is possible that the different responses would result in different weightings for draft picks and different curves. Second, while obvious we are using different data for the two time periods — players taken in the 1998 to 2002 and the 2003 to 2008 drafts, respectively — so it is possible that there has been some changes in how players’ careers have gone during these time frames. Third,

Figure 2 below addresses the two issues from the paragraph above. For this graph, I’ve used the methodology described in the post above to fit the first seven GP and TOI data from ’98 to ’02 and from ’03 to ’08. The TOI curves are virtually identical. The blue is almost completely obscured by the red. (Note that both are scaled to have 1000 as the maximum values.) Similarly the differences between the the GP curves are very similar. There seems to be some slight differences in the first 20 picks and in the last 30 or so picks but given the likely variation in the curves they seem quite similar. Further, while there are differences between the GP and the TOI curves, their basic shape is pretty close. (This is not a surprise if we note that the correlation between first seven GP and first seven TOI is 0.932.) So the differences in the curves is not due to the different time frames and not due to the different response variables.

The difference seems to be due to the differences in the methodology that I used to create the two curves. Figure 3 below illustrates this issue which is somewhat a function of restricting myself to a monotonic (continuously decreasing) curve and somewhat a function of the smoothing parameters. The response data in Figure 3 is the TOI data from 03–08. The blue curve is the version given in the 2016 DPVC, the other versions are from taking different values of the smoothing parameters within the monreg function. The lightest grey one, topmost curve over the first 50 picks, is the one that seems closest to the 2011 version. (Technical note: In 2011 I used the loess function to make the DPVC while I am now using monreg.) This time I’ve gone with a curve that is a bit less smooth but, hopefully, better reflects the data. Ultimately I am making a choice here between overall smoothness and fidelity to the data. Inherently I believe that a DPVC should be relatively smooth and so I’m not inclined to have something with steps like the black line of Figure 3 (even less smooth than the blue line) but I want to keep with the data so I’ve moved from the light grey line which is very smooth.

]]>

The conference will be held January 16th, 2016 at Porter Hall/231 University Centre Carleton University in Ottawa, Ontario. The conference webpage (including registration) can be found here:

http://statsportsconsulting.com/ottanalytics16/

Michael Schuckers, professor of statistics at St. Lawrence, who will speak at the conference, says hockey analytics has become a way to better understand the factors that impact the outcome of hockey games and hockey seasons. “The conference is an opportunity to meet and hear from some of the top minds in hockey analytics,” he said. “This year, we are especially excited to have IBM Analytics as a sponsor and presenter and to have a session on player tracking, which will cover work on capturing player location data during games. For the casual fan, this is an opportunity to learn more about the sorts of analytics that are being used to evaluate teams and players.”

One of the highlights of the conference will be a session on player tracking data and its role in the future of analytics.

This year the lead sponsors for the conference will be Carleton University and the Ottawa Senators. The other sponsors for the conference are IBM Analytics, St. Lawrence University (Canton, NY) , American Statistical Association Section on Statistics in Sports, Carleton University Athletics and the Statistical Society of Ottawa.

There will be a social event for conference attendees at Sens House the night before the conference.

This 2nd annual Ottawa Hockey Analytics Conference will consist of talks on the analysis of hockey along with time for socializing and networking. The event will be similar to hockey analytics events previously held in Calgary and Pittsburgh.

The webpage for last year’s version of this conference can be found at: Ottawa Hockey Analytics Conference 2015.

]]>So there has been a bunch of discussion lately about tanking in the NHL. Seems pretty clear that this has happened before and is happening again this year. Puck Daddy doesn’t mind it. Grantland’s Sean McIndoe calls it a ‘disastrous situation’ and has some suggestions including one (Option 11) for a postseason tournament. The idea of a postseason tournament is nice in theory but I wanted to sit down and see if it was possible to build a tournament which gave worse teams better odds of winning the tournament. So I set out to put together a concrete proposal for a tournament. Though it would clearly need NHLPA approval, that tournament could run concurrently with the Stanley Cup playoffs. It might even be good to have it start after a small layoff when there are fewer games per night. Sponsorship might be fun: The Second Cup, anyone? Or the Last Chance Cup sponsored by the New York Lottery.

Below is the full proposal for a single elimination tournament to determine the first round NHL entry draft order. The idea here is to award the first pick in the NHL draft to the team that wins a tournament. I’ve laid out the full proposal in the pdf below. The key to the process is to design a tournament that will have the highest probability of winning going to the worst team. That takes some rethinking of the usual tournament design. This proposed tournament is represented in the graph at right where the worst team is listed as 14, the second worst is 13 and the best team (to not make the playoffs) is 1.

- Draft order is determined by how far a team progresses in the tournament.
- There are guarantees that the worst team will get no worse than the 3rd pick, the 2nd worst team no worse than the 4th pick.
- Games are played at the rink of the lower seed, again to provide worst teams with better odds of progressing through the tournament.
- Draft ordering could be different for rounds beyond the first.
- It is possible to extend the model to tournaments for 15 or 16 teams when/if expansion occurs.
- There would be additional revenue from ticket sales and TV rights.
- Playoff OT rules apply though I would be open to the possibility of using the shootout to determine a winner if it would benefit worse teams and I suspect that it does (unless you’re NJD circa 2014).
- Players eligibility for the tournament is the same as for the Stanley Cup playoffs.

As McIndoe noted in his Grantland piece there are some potential issues including questionable integrity of games. One remedy might be to consider monetary incentives — new Honda’s for every member of the winning team, maybe — created for players by rewarding players and teams financially for continuing to win. There are some tiers here for teams and so it is possible that a team in the 8th worse spot might ‘tank’ for the 9th spot.

Here is a pdf of the proposal with some additional verbage.

NHLDraftSingleEliminationTournament

]]>

Last night I did a webinar on Statistical Methods for the Analysis of Hockey for St. Lawrence University Alumni and friends. I had a blast. There were lots of good questions and lots of interest. Exactly what I would expect from a bunch of Laurentians.

Below is a link to the slides for the talk.

]]>Additional Details can be found on the workshop website below:

Workshop website: HERE

]]>

Our aim is to build a model which accounts for the relative differences between rinks on the events that are recorded. The rest of this post is a summarized version of our paper on this topic which is linked below. **The focus of this work is in making data recorded for the following events comparable from rink to rink: Blocks, Giveaways, Hits, Missed Shots, Shots, and Takeaways.** We also look at the recording of aggregated events that count as Corsi events, Fenwick events and Turnovers. The last of these was created by Schuckers and Curro to account for the home bias of Takeaways and Giveaways as part of the THoR paper.

The data that we use for this analysis comes from the nhlscrapr R package created by Thomas and Ventura and includes 6858 games from six regular seasons.

We used a statistical regression to model counts of events per game with several predictors **including team factors, average score differential, and rink.** The estimated rink effects that we derive can be used to reweight recorded events so that can have comparable counts of events across rinks. Applying our methodology to data from six regular seasons (2007-08 through 2012-13), we find that for the most part NHL rinks (and the individuals therein) do a reasonably consistent job of recording events. This is especially true of the recording of SHOTs, which has the fewest rinks with significant recording issues and has the smallest rink effects that we found. We only found that St. Louis and Florida have rates for the recording of rinks that differ significantly from other rinks. Florida counts shots at a rate that is about 3% higher than other rinks while St. Louis counts shots at a rate that is 4.5% lower than the rest of the league.

However, there are some rinks with rink effects that are significant and consistent across these seasons for other events. Zeroing in on blocks, hits and misses, there were four rinks that consistently inflated or deflated the counts of those events: Edmonton, Los Angeles, New Jersey and Toronto. There were also six rinks that were not significantly different from the rest of the league on those events. These were: Buffalo, Nashville, Pittsburgh, St. Louis, Tampa Bay and Vancouver.

While event counts are impacted by rink effect, ratios of events such as Corsi For Percentage remain relatively unaffected. The table below demonstrates how small the impact of rink effects are on the Corsi For %. This despite our estimation that Boston, Columbus, Edmonton, Los Angeles, New Jersey, and Toronto all have significant rink effects for the counting of Corsi events. New Jersey is particularly egregious, undercounting Corsi events by about 16%. The reason that ratios are not impacted as much as counts is that the impact of a given rink is felt in both the numerator and the denominator for the home team.

**Table 1: Comparison of Corsi For % and Adjusted Corsi For % for 2012-13 NHL Regular Season**

Top 5 and Bottom 5 teams on Corsi For Pct.

Team | Corsi For Pct. | Adjusted Corsi For Pct. |

L.A. | 0.5630 | 0.5628 |

N.J. | 0.5592 | 0.5592 |

BOS | 0.5433 | 0.5430 |

CHI | 0.5414 | 0.5420 |

DET | 0.5366 | 0.5365 |

… | … | … |

CBJ | 0.4711 | 0.4706 |

NSH | 0.4668 | 0.4662 |

BUF | 0.4512 | 0.4513 |

EDM | 0.4458 | 0.4445 |

TOR | 0.4408 | 0.4398 |

While ratios are not affected in a major way by rink effects, the counts of individual events recorded as part of the NHL’s RTSS system are. To illustrate this we looked at the Block shots recorded by players during the 2012-13 NHL Regular season. For those rinks where there are rink effects for block shots, we take each block and weight it by 1 divided by the rink effect. So that for Nassau Coliseum home of the New York Islanders, NYI, each block counts as 1/1.208 or 0.828 of a block since that rink overcounts blocks by about 20.8%. Similarly, since the rink effect for Anaheim is 0.721, each block in that rink counts as 1/0.721 or 1.387 of a block.

We adjusted every block during the 2012-23 regular season for the rink in which it occurred and the top 10 players on adjusted blocks is given in the table below. Unlike the case for Corsi For %, there are substantial changes for the counts of block events. Here the top player on the revised list, Francois Beauchemin, was originally ranked 10th on the list of players with the most blocks. There are other large changes for players from the Islanders (Andrew MacDonald) and the Capitals (John Carlson) due to the relative counting of BLOCKs in those rinks. Macdonald and Carlson move from tied for third in raw BLOCKs to eighth and ninth, respectively, after our adjustment.

**Table 2: Comparison of Raw Block Counts and Adjusted Block Counts for 2012-13 NHL Regular Season**

Top 10 players based on Adjusted Block Counts

Player | Team | Adjusted Blocks | Raw Blocks | Differential |

F. Beauchemin | ANA | 133.3 | 111 | 22.3 |

G. Zanon | COL | 127.2 | 124 | 3.2 |

D Girardi | NYR | 120.7 | 125 | -4.3 |

R Hainsey | WPG | 120.5 | 123 | -2.5 |

D Seidenberg | BOS | 119.8 | 115 | 4.8 |

L Smid | EDM | 114.7 | 119 | -4.3 |

B Orpik | PIT | 110.3 | 114 | -3.7 |

A MacDonald | NYI | 109.8 | 123 | -13.2 |

J Carlson | WSH | 109.4 | 123 | -13.6 |

The full paper can be found HERE.

Note: Brian’s contributions to this project, with the exception of minor edits of the paper, were made while he was an Associate Professor in the Department of Mathematical Sciences at the United States Military Academy, West Point, NY, prior to joining the Florida Panthers.

]]>*Recently there has been a good deal of interest in the use of statistical methods and statistical thinking in sports. Work of this kind has been going on for many many years though often behind the scenes. With the publication of Michael Lewis’ Moneyball and the subsequent movie of the same name, the interest in this kind of work has grown extensively. Further evidence of this can be found in the increased number of conferences devoted to the topic and the increased use of these topics in the sports media. In this talk, I will begin with an overview of sports analytics including a discussion of the general approaches and methods. Having done that, I will discuss two examples from my own recent work: Rink effects in the NHL and NHL player ratings.*

Thanks to Shirley Mills at Carleton University for the invitation to speak and for lunch. Thanks to her students for some enlightening conversations. Was also nice to see some folks I know from the Canadian Border Services Agency in the audience.

Here are the slides from that talk.

]]>A week ago (on August 6th), I appeared on a panel on the Joint Statistical Meetings in Boston along with Brian Macdonald, Andrew Thomas, Sam Ventura and Kevin Mongeon. The panel was a good one on some of the advanced statistical methods that are being used in hockey. Fluto Shinzawa of the Boston Globe attended the panel and wrote up a piece on the panel that can be found at this link. A rough audio of this can be found at Andrew Thomas’ website: www.acthomas.ca/?p=62. Andrew was the organizer of the session.

I spoke for about 5 minutes on a model for Rink Effects that Brian Macdonald and I developed. Rink effects are something that has been known about in Hockey Analytics for some time. The issue is that there has not been a way to correct for the differences between the recording of RTSS at various rinks. This issue has been cited as one of the hurdles to adoption of hockey analytics. Here is an article with a quote from Chuck Fletcher of the Wild on data inconsistencies. Sportsnet.ca article Fletcher’s wrong about the recording of events in Detroit relative to Minnesota though Minnesota does undercount shots relative to the rest of the league. The bigger picture is that we need a way to adjust event counts. Thus, the goal of or paper is to develop tools to allow analysis to adjust RTSS events (Shots, Hits, Misses, Blocks) so that they are consistent from rink to rink. We’re not aiming for absolute truth (i.e. that what is counted as a hit is genuinely at hit) but rather for relative truth meaning that events are counted, on average, the same from rink to rink.

My slides from a short presentation at the JSM Panel are found below. We, Brian Macdonald and I, are working on finishing the final paper before I present the full set of results at the Royal Statistical Society’s International Conference in Sheffield, UK at the beginning of September.

(This article was edited 8/28/14 to add the link to the Sportsnet.ca article.)

]]>

I’ve updated the Total Hockey Ratings (THoR) for the 2013-14 season. Both the Even Strength and the All Events can be found on the THoR Page. Recall that THoR evaluates each play from the NHL’s RTSS system for it’s likelihood to lead to a goal in the subsequent 20 seconds. We then get a rating for players by accounting for their impact after adjusting for quality of teammates, quality of competition, zone starts, score effects, etc. The outcome metric here is Wins above replacement (WAR) relative to position. Details on the methodology can be found on the THoR Page. Below are some of the highlights from these results from the posted files which include the top 50 players.

**Even Strength**

Some interesting results here. I’ll start with EV first. The top ten here are A. Kopitar, M. Niskanen, M-E Vlasic, R. Suter, A. Sekera, J. Pavelski, J. Jagr, L. Couture, J. Hudler, T. Vanek.. Sekera and Hudler might be a bit of a surprise but none of the others would seem so. Suter has not always been so highly ranked by THoR but this year he is evaluated well. Couple of other notes on this. At even strength, THoR is a big fan of Gustav Nyquist of the Red Wings. And THoR also likes Tampa Bay’s Ondrej Palat as the best candidate for the Calder Trophy but he is.closely followed by Hampus Lindholm of the Ducks. The total impact of Palat and HIndholm is probably very similar but Palat was on the ice for fewer events. MacKinnon is not rated as highly. (Note that THoR does not take account of PDO as part of it’s evaluation of players so Nyquist’s high shooting percentage is not relevant to THoR but his high rate of shots per time on the ice is.)

In this analysis, we also find that the bottom five include: A. Ovechkin, A. Edler, A. MacDonald, N. Grossman, N. Yakupov. Again, these are not the worst rated players in the league but those that have cost their teams the most due to the amount of ice time they are getting in the THoR methodology.

**All Events**

The top ten evaluating players based upon all events is: R. Suter, C. Kunitz, J. Jagr, M.Niskanen, M. Giordano, A. Markov, T. Brouwer, J. Carlson, D Doughty and O Ekman-Larsson.

Seven defensemen are on that list and that is due to the fact that we are doing two things. One, we are normalizing average player value to the position and, two, we are taking that average relative value and multiplying by the number of plays for which the player was on the ice. Since top D tend to play more minutes, they are giving more value under this assessment. This, in particular, helps Suter who was on the ice for just about 25% more plays than Giordano. On a per play basis Giordano had a greater average impact but Suter had a bigger overall impact on the season. As with any statistical metric, we can derive variability in these estimates.

Hampus Lindholm, a defensemen for the Anaheim Ducks, is the highest rated, 11th, rookie on the list. He is not a Calder Trophy finalist. The finalists are Ondrej Palat, Tyler Johnson and Nathan MacKinnon, all forwards, who are rated 38th, 75th and 94th, respectively.

Again, we see some of the usual names at the top of the THoR list. Doughty, Steen, Hossa, Hornqvist, Couture, Karlsson, Kopitar and Subban. THoR is a two-way player metric and so this is not a surprise given the high year to year correlation for THoR.

Of note is that Ovechkin (not in the Top 250) compensates somewhat for his really poor even strength play with PP play that makes him a replacement level player for this year though

At the bottom of the ratings (not in the file for download), are A. Edler, R. Regehr, A. Ference, N. Nystrom and J. Cowen. Again, they were not the worst players but by virtue of their amount of ice time, they hurt their teams the most.

Also of note here is that Kunitz gets very high marks and higher marks than Crosby. This is a results of multicollinearity in these data with Crosby and Kunitz playing nearly 4/5th of their 5v5 ice time together (Source: www.behindthenet.ca). While THoR uses ridge regression to deal with this, it is currently optimized to provide high predictive reliability.

Alfredson, Jagr, Markov, Moulson, Niskanen, Timmonen, Vanek and Vrbata are all in the THoR Top 100 for 2013-14 and they are all UFA’s this summer. They should have plenty of good offers this offseason (except for Jagr who has resigned with NJD). That, along with a higher salary cap, should make for an interesting summer.

One interesting note for long-time hockey analytics followers is the appearance of Sean Couturier and Mark Scheifele in the top 250 players, though Couturier is worth over one win more than Scheifele this year based upon THoR. Also, David Perron of the Oilers, who was obtained in a trade from the Blues, shows up in the top 100.

Note that we use a different THoR model for Even Strength than for All Events that accounts for differences between 5v5, 5v4 and 5v3, among others, .

]]>