This post is also available in / Disponible en: : English
Hitting the ball hard frequently and keeping a good, constant launch angle sounds like a good recipe for successful hitting. A lot of research has been made about the importance of maximizing the Exit Velocity of the hit ball while doing it on the appropriate Launch Angle.
On this topic, it is just as important to understand that not every ball batted over 100mph is Hard Hit but also that those under that threshold are not all out of that category, it depends on the Launch Angle as described by Connor Kurcon in this great article about Dynamic Hard Hit rate which you should read.
In it, Connor explains that every hard hit Batted Ball Event (BBE) is not created the same and there is a dynamic threshold for the EV (hence the name) which varies depending on the LA and, for more extreme angles, the hard hit bottom limit is lower than for the central ones. Using this principle, a DHH% is calculated for every player (for the purpose of this article, I’m using some slightly different DHH% numbers than those calculated by Connor but following the same relationship).
The second thing, but in no way less important, is about the actual distribution of the Launch Angle, as defined by Alex Chamberlain in this other great piece (which you should read, too) as the variance distribution of the batter’s launch angles; “the narrower the distribution of his launch angles, the tighter. The wider, the looser”. I’ve also calculated this and noted it as Sd(LA).
While looking at them independently, DHH% and Sd(LA) show a good correlation with same year wOBA and wRC+, for players with more than 155 BBE this 2020 season, the best correlations were:
DHH% vs wOBA=47%
Sd(LA) vs wRC+=47.3%.
Now, being them as important as they are for successfully hitting, I wondered if there was a way we can combine them and test for a better correlation?
We know that we want DHH% as high as possible and Sd(LA), the opposite or “tighter”.
For this type of relationship, a simple division might do the trick:
DHH%/Sd(LA). Let’s call it Q (for Quotient).
Whenever Q rises, it means that DHH% went up or Sd(LA) went down or, even better, both things happened. Calculating R for 2020 Q vs 2020 wOBA and wRC+, for the same players with 155 BBE, correlation is 51.7% and 50.3%, ten percent better than individually. (Q was multiplied by 100 for better readability and graphing).
Is this repeatable? How many BBEs are truly needed? How fast does it stabilize? And, even more important, is it predictive? The next step should be to try to answer those questions.
BTW, attached there are the players with the highest Q this season. Interesting players to note: Devers and Peralta.
Regarding the predictive capabilities, some initial testing is encouraging. I used the 2018 season as the first data set, I decided on 2018, so I could have a “proper” season to correlate with, 2019. I got DHH% and Sd (LA) for every batter and then calculated Q= (DHH%/Sd(LA))*100 for each of them with 155 BBE as the cutoff.
First findings were less than impressive. R2 for:
Sd(LA) 2018 vs wRC+ & wOBA 2019=0.0143 & 0.0145
DHH% 2018 vs wRC+ & wOBA 2019=0.0706 & 0.0698
Max EV 2018 vs wRC+ &wOBA 2019=0.0738 & 0.0708
Q 2018 vs wRC+ 2019=0.0840 Q 2018 vs wOBA 2019=0.0831
A very light correlation.
Initially, I thought that was due to a not-enough-BBE problem, so I checked with higher BBE cutoffs. Correlation went up but not enough, frustrating indeed. Then, it hit me that I was kind of comparing large apples with big, medium, and small apples: even while cutting on 155 BBE for the 2018 player’s pool, I was using the same players for the 2019 data set regardless of their PA that year, which for a lot of cases was diminished and offering too few BBE in 2019.
I decided then to use a quick’n’dirty rule of leaving only those with max 15% PA less in 2019 than in 2018, and voilá, correlation doubled. I tinkered with the BBE cutoff and found (for this dataset) that 215 BBE was a good compromise:
Sd(LA) 2018 vs wRC+ & wOBA 2019=0.0235 & 0.0257
DHH% 2018 vs wRC+ & wOBA 2019=0.1932 & 0.1866
Max EV 2018 vs wRC+ &wOBA 2019=0.1637 & 0.1582
Q 2018 vs wRC+ 2019=0.2179 = 47% correlation
Q 2018 vs wOBA 2019=0.2099 = 46% correlation
Good & bad news:
DHH% by itself correlates nicely here; Max EV does too, on a lesser level. Sd(LA) doesn’t on its own, BUT Q does better than any of them so, Sd (LA) has an impact. Lots of assumptions made: need a fair number of BBEs, before & after and just one season reviewed, but it looks like Q could potentially be another good tool for the bag, if maybe a little overkill as MaxEV and DHH% by themselves do a good job but it is a possible next step on future performance estimation.
As an example, for a 2018 Q higher than 0.80 (arbitrary selection), only 6 out of 30 players had a less than average wRC+ in 2019.
More analysis and calculations should be done, but as a conversation starter this could be a good lead.
EE, Data geek, Baseball fan. Twitter: @camarcano