Statistics are, as the Merriam-Webster dictionary points out, “a collection of quantitative data”. Pure and simple. What is not that simple is how we use this data or how we make sense of it.
Fortunately, we live in a time and age where we have at the disposal of our fingertips incredible sources (in quantity and quality) of valuable information – including all kinds of stats – about baseball. We can check how good the spin rate of Justin Verlander’s Fastball is (fairly good) or how Matthew Boyd’s 2019 ERA is deceptive according to advanced stats like xFIP and SIERA, so, he should probably be better this year.
I love spin rates and SIERA and a ton of other advanced stats, and the information we can get from them surpasses most of the time what we can analyze from speed (mph) or ERA, to name a couple of traditional stats. Nevertheless, I also like to try to simplify things but still be able to obtain powerful insights to make educated decisions.
Nothing gets simpler than Balls and Strikes.
I mean, we know them so well that even before the ubiquitous virtual strike zones we see nowadays, we could instantly start shouting to the umpire when we thought he was missing the calls. And we’ll do it forever because we KNOW balls and strikes. We know that more strikes than balls will always be good and pitchers that can do that are usually bound to have more success.
K%-BB% and (k-bb)/ip (let’s call them the K-Bs stats) are a couple of stats that exist just because of balls and strikes. They summarize in a straightforward way the achievement a pitcher has over what are the two principal outcomes that he can directly influence the most during an outing: strikeouts and walks.
Strikeout rate (K%) and walk rate (BB%) are calculations on how often a pitcher strikes out or walks batters per plate appearance (PA). You can calculate them by dividing the total number of strikeouts or walks a pitcher issued between the plate appearances batters got against him during a period of time (a week, month, season, etc.). Then, for the purpose of getting K%-BB%, you just subtract them and that’s it.
(k-bb)/ip goes similarly but you subtract strikeouts minus base on balls first and then divide the result between innings pitched. The reason for this (and comparably for dividing between plate appearances in K%-BB%) is to obtain ratios or proportions that allow us to compare pitchers who have faced vastly different quantities of batters.
If interested, you can find some more info on these stats here. By the way, Bill James was not a fan of them about 10 years ago, but Tom Tango was. What intrigues me the most about the stats is if they could be used to anticipate how a pitcher will fare afterward, their predictability, so I am going to do some calculations to bring light on that. Some folks have done some writing about this but their conclusions are not that clear to me, so, I did some research of my own.
First, I pulled data for pitchers during the 2019 season: main interest is on K%-BB% and (k-bb)/ip of course and also ERA and some ERA estimators like xFIP and SIERA. Note that I am including as a consistency check another estimator called CSW, which in 2019 was already proven to have a great correlation with SIERA and could be very especially useful as a prediction stat.
Pitchers were restricted to those who had at least 23 or more games started as I wanted starters with a lot of innings pitched for a bigger sample.
2019 Pitching data.
SIERA is almost universally acclaimed as one of the best ERA estimators and initially, I want to check what’s the correlation between the K-Bs stats and it. First, I created graphs for the relationship between ERA, xFIP, and SIERA with the K-Bs and plotted the trend line, its equation, and R2 for each:
As expected, due to the impact of Ks and BBs in its formula, SIERA has the highest R2, which is great as it indicates the lowest variance and that leads to the thinking that these simpler stats can be used with equivalent results to SIERA. But if I really want to take it a step further, I should be comparing 2018 K-Bs with 2019 SIERA and check for correlation to try to find out any real predictability.
Pulling the data and graphing it we get:
At first, we could only interpret that, as the values of R2 show, the concurrence in 2019 SIERA due to 2018 (k-bb)/ip or 2018 K%-BB% is 44% and 48% respectively and that could be too low to be meaningful. With that thinking, we would be right and wrong about it at the same time, being the word variance the deal breaker here.
As Phil Birnbaum graciously explains in this post, most of the time R2 tells a lot from a statistician point of view and its value is important but, in layperson terms, the coefficient of correlation R (the square root of R2) is more informative and useful. R tells us that correlation is around 66% and 70% for (k-bb)/ip and K%-BB% respectively, which is good and potentially indicates that these stats can be used to predict a pitcher’s performance.
There is a lot more data checking that I should do, in fact using more seasons would help as the sample will be larger, but this small exercise strongly shows that the K-Bs are good tools to make educated estimates on future pitchers performances. It might feel like reinventing the wheel but it would surprise you how much people tend to underestimate these stats, so any boost to our confidence in them is important.
So, what can we do with K%-BB% and (k-bb)/ip that leads to useful and practical information? Well, I dedicated an article to the evaluation of the 50 best pitchers coming into the 2020 season according to these stats from 2019 and there are a few surprises that I think you should consider. Spoiler alert: you should try to get Boyd, and Kevin Gausman might be in for a very decent season.
Baseball is such a wonderful game that, besides the joys of simply watching it, it has gifted us with the joy of measuring and analyzing it ad infinitum. It can be, and it is, measured beyond silliness which can be a blessing or a curse as separating the needles from the hay seems, a lot of times, complicated. But it doesn’t have to be; the beauty of having so much information available is that if we can understand it, diverse ways of looking at our beloved game increase our appreciation of it: if knowledge is power, applied knowledge is wisdom. Let’s try to be wise or die trying.
EE, Data geek, Baseball fan. Twitter: @camarcano