This is the second instalment of my 5-part WAR series. Here are the links to the other sections:
- Part 1: Introduction
- Part 2: Shot Impact
- Part 3: Expected Point Production
- Part 4: Extras
- Part 5: Data Release
Today we’re going to talk about everybody’s favourite stat:
The word “corsi” can be divisive in the hockey community, so I tend to just call them shots. We know that players can repeatably impact shot differentials, and we know that shot differentials are a good predictor of future goal differentials. Ergo, we’re going to look at players’ impact on shot differentials.
The biggest difficulty in hockey analytics is trying to adjust for context. There are so many factors in play; quality of linemates; quality of competition; team strength – how do we account for it all? I’ll show you how I went about it:
Accounting for Quality of Linemates
Corsica has a cool stat in its Context section called “CF.QoT”, which represents the CF% (shot share) of your teammates. If you subtract a player’s CF% by their CF.QoT, you’ll get their CF% relative to their teammates (or CF%RelTM). This is the metric I used to determine players’ impact on shot differentials.
Now, this method is far from perfect. I’ll go on to explain some of its major flaws in the Limitations section on Friday, but just know that this doesn’t magically solve the conundrum of context. It’s simply my best attempt at accounting for linemate quality.
Accounting for Quality of Competition
Oh boy. This is everyone’s favourite area of hockey analysis – how much does quality of competition matter? Some say a lot, some say not much at all, and most are left scratching their heads wondering why we haven’t come to a general consensus yet. Here’s my take: players typically play similar competition from year to year, so we’ll rarely see a player go from extremely sheltered usage one year to extremely difficult usage the next year. This limits our ability to know precisely how much it matters, since it’s never really isolated as an independent variable.
What we need is a controlled environment where everything else is equal except for the quality of competition – where a player goes from a 50 CF% team that uses him as a #6 defenceman to a 50 CF% team that uses him as a #1 defenceman. What we need is someone like Matt Hunwick.
I think this is probably the best case study we have that demonstrates what happens when you drastically change a player’s usage. Although Hunwick played on a different team, the Leafs drove play just as well as the Rangers did in the previous season. What changed was that Hunwick went from facing some of the easiest competition in the league to being hard-matched against opposing top lines.
Now, this wasn’t the only factor impacting results. Hunwick playing on his left side in Toronto forced his partner, Morgan Rielly, to play on his wrong side. This wasn’t the case in New York, where his most common partners, Dan Boyle and Kevin Klein, were playing quite literally on their right side. Dominic Galamini did some excellent research proving that handedness significantly impacts shot differentials, so that’s definitely something we have to take into account.
After doing some quick math, I found that the impact of Rielly playing on his wrong side was worth a CF%Rel of about 4.0%. This tells me that Hunwick’s jump from 13th percentile in Quality of Competition to the 98th percentile was worth a difference in CF%Rel of about 7.0% (4.1 minus -6.9 minus 4.0 = 7.0)
I’m not one to put too much stock in a small sample, but I do think Hunwick is an excellent example of the impact that Quality of Competition can have on extreme ends of the spectrum. Taking his results into account, I came up with the following weights for #MyModel:
(The metric used for Quality of Competition was TOI, or “QoC.TOI” in Corsica)
As you can see, I didn’t make QoC impact CF%Rel as strongly as it did for Hunwick (a jump from the 5th percentile to the 95th percentile is a 6.0 difference, not north of 7.0), but this still represents a much stronger weight for QoC than you’ve probably seen in other models. This is because, intuitively, I believe that QoC matters more than we’ve been able to mathematically prove at this point in time. I think when we get more case studies like Hunwick who go through a drastic change in usage, we’ll be able to get a better idea on its impact. I look forward to seeing how defencemen like Nate Schmidt, Colin Miller, and Brayden McNabb perform this season in Vegas, considering they thrived in sheltered roles last season. Until we see those results, though, these are the weights I’ve chosen for QoC.
Accounting for Team Strength
A team's results are driven primarily by the contributions of its top-end players. It's fair to question value when those results are poor.
— manny, but spooky (@MannyElk) May 16, 2017
The logic here is that it’s easier to put up good relative numbers on a bad team, and more difficult to do so on a good team. Thinking about it logically, if your teammates suck, it’s going to inflate your relative numbers since your team is putting up such poor results when you’re off of the ice. The opposite is true of dominant teams, where it’s much more difficult to set yourself apart from the pack.
Coming up with this adjustment was, by far, the least scientific part of developing my WAR metric. I was literally playing around with the formula for hours until I found something that gave me the best results. What’s hilarious is that after all of that effort, what I found worked best actually ended up being pretty straightforward:
(CF.QoT – 50) / 2
It’s simple, but I found that it did a solid job of boosting the shot metrics of players who played on dominant Corsi teams (ie. Boston) and lowering those of players who played on poor Corsi teams (ie. Arizona). This is my best attempt at accounting for that difference, but if anyone can find a better adjustment, I’m very open to changing this.
We already talked about shot differentials. The reason we like them is because they provide us with lots of data in a short period of time. There are much fewer meaningful events throughout the course of a hockey game than in sports like basketball, so Corsi is great for getting us as much information as we can. The problem with using CF% is that all shot attempts are treated as equal. Intuitively, we know that a shot from the slot is much more valuable than a shot from the blueline, so how do we account for this?
There’s an awesome metric available on Corsica called Expected Goals, which weights shots based on a number of factors that influence their likelihood of becoming goals. You can read about the model here (and “Part 1.5” here), but for those of you looking for a quick recap, here’s a list of the factors it takes into account:
- Shot Type
- Shot Distance
- Shot Angle
- Rebounds (whether or not the shot was a rebound)
- Rush Shot (whether or not the shot was a rush sot)
Based on the last 10 years of data, the model is able to determine each shot’s probability of going in (ie. a shot with an 8% chance of going in is worth 0.08 Expected Goals). It then adds up the total number of Expected Goals For and Against while a player’s on the ice, and you can have awesome stats like xGF%. This simply represents your team’s share of Expected Goals when you’re on the ice. By subtracting a player’s xGF% by their teammates’ xGF% (“xGF.QoT” on Corsica), I was able to determine their xGF%RelTM, which I used as my measure for their impact on shot quality.
Expected Goals are incredibly descriptive of what’s happened on the ice, but like I said yesterday, that doesn’t interest me very much – I’m more interested in predictive power. Although Expected Goals are the new craze in hockey analytics, there’s some controversy concerning whether or not they’re more predictive than Corsi (they are when you take shooting talent into account, but when you don’t, Corsi is actually more predictive).
Since I’m going to be taking shooting talent into account (which I’ll explain in greater detail tomorrow), I will be including Expected Goals in my formula. I’m going to blend it with Corsi though, since Expected Goals take longer to reach their peak in predictive value. By blending a player’s impact on Corsi and Expected Goals, I feel like I’m getting the best of both worlds – a larger sample with all of the Corsi events, while also trying to take shot quality into account by looking at Expected Goals.
Essentially, I’m taking a player’s impact on shot differentials (CF%RelTM + QoC Adjustment + Team Adjustment), adding it to their impact on shot quality (xGF%RelTM + QoC Adjustment + Team Adjustment), then dividing that total by two. This gives us an adjusted expected goal share above average.
Blending Shot Quantity & Shot Quality
To help explain how this works, we’ll use the example of John Doe, who plays on a perfectly average team and faces league average competition. Let’s say he has a great season driving play, finishing with a CF%RelTM of +3.0% and a xGF%RelTM of +5.0%. When we blend those two together [(3+5)/2=4], we get +4.0%, meaning that John Doe would be expected to provide an average team with a 4.0% higher goal share when he’s on the ice.
To find out how many Goals that’s worth over the course of a full season, you take that percentage (0.04) and multiply it by the total number of Expected Goals (For and Against) that the player was on the ice for that season. So in this case, let’s say John Doe had an xGF of 70 when he was on the ice and an xGA of 60 throughout the course of the season (70+60=130). We multiply 0.04 by 130, and we get 5.2 – that’s how many Goals Above Average his 5v5 Shot Impact was worth.
To determine how many Goals Above Replacement that is, you simply define replacement level and add that to each player’s 5v5 Impact. My logic is that 31 NHL teams have 12 forwards and 6 defensemen, meaning that there are 372 NHL forwards and 186 NHL defencemen. In theory, anything below that is replacement level. So I found the 373rd ranked forward & 187th ranked defenceman in 5v5 Shot Impact/60, and added that number to everyone’s 5v5 Shot Impact/60. This method was used to determine replacement level for every component of the GAR formula.
To reward you for putting up with my nerdy nonsense for so long, here are the Top 30 Forwards & Defencemen from 2016-2017 in Shot Impact per 82 games (minimum TOI of 500).
Yes, I extended the list to 31 defencemen so Jake Gardiner’s name would appear. No, I will not apologize.
And just because I love you so much, here are the Top 30 Forwards & Defencemen in Shot Impact from 2014-2017 (minimum TOI of 1500).
It’s worth noting that most of Defencemen’s 5v5 value comes from driving play – since they can’t significantly impact shooting percentage – so naturally their Shot Impact will be higher than Forwards on average. If that sounds a bit confusing, don’t worry; I go over this in Part 3 – Expected Point Production.