Back in March our Total Hockey Ratings finished 2nd as part of the MIT Sloan Sports Analytics Conference Research Paper Competition.  That drew plenty of attention to the outcomes but little to the underlying process and methods.  In this and a couple more posts we’ll take a look and evaluate how well THoR does as a two-way hockey metric.  I’ll start by comparing some of the omnibus (all inclusive) player ratings.  My goal is to   consider how these ratings do at predicting future performance of players and see how THoR stacked up against some of the other ratings.  The Total Hockey Ratings original paper is here.  I looked at three metrics: David Johnson’s Hockey Analysis Ratings Total (HART) — and they said you can’t measure heart—, Gabe Desjardins’ Corsi Rel and THoR.  I took each of these metrics for four years’ worth of data.  The four years that I used were the four most recent NHL regular season : 2009-10, 2010-11, 2011-12 and 2012-13.   For each year, I took the ratings of each player and looked at how the ratings were correlated from year to year.  I also looked at correlation between all season’s ratings.  Note that we considered looking at GVT but since it includes data outside of even strength we will save our comparison with GVT for our model with all data including power plays.

Each of these websites uses different input data for their calculations and determines amount of play differently.   For HART, I used the 5 on 5 Close data for players who had more than 200 minutes in a given season.  For CorsiRel, I used 5v5 data for players who played in more than 30 games.  For THoR, I used players who were on the ice for more than 1000 plays (either 5v5 or 4v4) as extracted from the NHL.com play-by-play files.  (By the way, there is a great way to extract NHL play-by-play data now in R via the nhlscrapr package provided by Andrew Thomas and Sam Ventura. )  These particular cutoffs were used to ensure that all of the methods had similar numbers of players when making the year to year comparisons. 

A quick note about this proposed analysis. The methodology here is prone to yield regression to the mean.  That is to be expected.  The correlation, r, between 09-10 ratings and 10-11 ratings should be more than the correlation between 09-10 and 11-12.  Lots of reasons for this but regression to the mean is foremost among these.  Other effects such as aging will also factor into this. The other metric that I looked at was the estimated slope of the least squares regression, b, between the years.   This also gives some idea of how much ‘regression to the mean’ is occurring.  Slopes that are lower (and further from one) indicate more ‘regression to the mean.’  Ideally our metrics will have slope close to one if we are getting the same level of performance from one season to the next.  This is unlikely since we know that whether it is goals going in or Corsi events,  there is an element of luck and so we would expect that those who are lucky one year will be less lucky the next. 

In this analysis we start by looking at the results for HART.  They are given in the table below.  In each cell the first number is the correlation in ratings between the two sets of seasons given. The second entry in each cell is n, the number of players that went into the correlation.  In order to be considered players had to meet the criteria for a given metric in both seasons.   The third entry in each cell is the slope, b.  This will be the same for all of the tables that follow.  For ratings in the 2009-10 and 2010-11 seasons there was a correlation of r=0.457 between them based upon n=489 players and the slope of the regression line was b=0.478.  The last values suggests that a player with a HART value of X in 2009-10 would be predicted to have a value of 0.487X for 2010-11.  

Table 1: Hockey Analysis Ratings Total (HART)

Seasons
10-11
11-12
12-13
 
09-10
r=0.457
n=489
b=0.478
0.363
448
0.368
0.244
347
0.349
 
10-11
 
0.489
519
0.452
0.394
404
0.527
 
11-12
   
0.544
442
0.756

The next set of results is for Corsi Rel. As before we looked at ratings for one year and correlated them with ratings in other years.  The further out we go from a given year the fewer players that will appear in both due to retirements, injuries, etc.  Clearly we can see from this that Corsi is something that has a higher correlation than HART and that does not deteriorate much over time.  The drop from one year correlations ~0.59 to two year correlation ~0.49 to three year correlation is not as much as we saw for HART.    Additionally the slopes for HART are less than they are for CorsiRel.  

Table 2: Results for Corsi Rel

Seasons
10-11
11-12
12-13
 
09-10
r=0.561
n=497
b=0.616
0.494
447
0.584
0.479
333
0.578
 
10-11
 
0.571
515
0.609
0.490
391
0.531
 
11-12
   
0.616
434
0.656

Lastly, we come to THoR.  The year to year correlations, given in the table below, are higher for THoR than they are for CorsiRel with the exception of the correlation from 2011-12 to 2012-13.  THoR does particularly well at maintaining value past one year.    Additionally, the slope for THoR for differences beyond one year is larger than for CorsiRel or for HART.   THoR estimates regress to the mean less than Corsi Rel estimates which regress less than HART estimates.  This is especially useful for prediction of future player performance.

I should note that this version of THoR is slightly different than the one that we presented at MIT Sloan Sports Analytics Conference last year and which took 2nd place in the research paper competition.  We have adjusted the amount of shrinkage here to improve these correlations.  In the MIT SSAC paper we had an analysis of the correlation for players that switched teams and their performance but none of the ten people who have read the whole paper noticed it.

Table 3: Results for Total Hockey Ratings (THoR)

Seasons
10-11
11-12
12-13
 
09-10
r=0.663
n=  505
b=0.623
0.687
452
0.664
0.623
355
0.597
 
10-11
 
0.705
528
0.737
0.595
412
0.619
 
11-12
   
0.620
457
0.616
 

To compare CorsiRel and THoR directly I made the following table (Table 4).  Frankly both performed well.  But it is pretty clear that THoR, on average, has a higher year to year correlation than CorsiRel and that THoR is more consistent over time.  THoR pretty clearly beats Corsi by at least 0.1 in all but one of the correlations we considered.  And the differences are larger for two and three year correlations.

Table 4: Comparing CorsiRel and THoR

Seasons
10-11
11-12
12-13
 
09-10
CorsiRel= 0.561
THoR=  0.663
 
0.494
0.687
0.479
0.623
 
10-11
 
0.571
0.705
0.490
0.595
 
11-12
   
0.616
0.620
 


Before I finish, I just want to note that THoR is a different kind of model from these others.  THoR accounts for a variety of factors that are not explicitly modeled in the other two ratings.  The outcome metric for THoR is a value assigned to each play in the play by play files based upon the historical chance that it leads to a goal.  THoR has build-in components to the model for dealing with home ice, score effects, rink effects, Quality of Teammates, Quality of Competition and Zone Starts.  The details are on the THoR website and in the original MIT Sloan paper.  Second, we still have some work to do to argue the utility of THoR but it does appear that it represents a more internally consistent metric over time.  In a statistical sense it seems reliable.  We still need to evaluate its validity.  We’ll do that in Part II.