Revisiting Ks and BBs.

This post is also available in / Disponible en: : enEnglishesEspañol (Spanish)

Statistics are, as the Merriam-Webster dictionary points out, “a collection of quantitative data”. Pure and simple. What is not that simple is how we use this data or how we make sense of it.

Fortunately, we live in a time and age where we have at the disposal of our fingertips incredible sources (in quantity and quality) of valuable information – including all kinds of stats – about baseball. We can check how good the spin rate of Justin Verlander’s Fastball is (fairly good) or how Matthew Boyd’s 2019 ERA is deceptive according to advanced stats like xFIP and SIERA, so, he should probably be better this year.

I love spin rates and SIERA and a ton of other advanced stats, and the information we can get from them surpasses most of the time what we can analyze from speed (mph) or ERA, to name a couple of traditional stats. Nevertheless, I also like to try to simplify things but still be able to obtain powerful insights to make educated decisions.

Nothing gets simpler than Balls and Strikes.

I mean, we know them so well that even before the ubiquitous virtual strike zones we see nowadays, we could instantly start shouting to the umpire when we thought he was missing the calls. And we’ll do it forever because we KNOW balls and strikes. We know that more strikes than balls will always be good and pitchers that can do that are usually bound to have more success.

K%-BB% and (k-bb)/ip (let’s call them the K-Bs stats) are a couple of stats that exist just because of balls and strikes. They summarize in a straightforward way the achievement a pitcher has over what are the two principal outcomes that he can directly influence the most during an outing: strikeouts and walks. 

Strikeout rate (K%) and walk rate (BB%) are calculations on how often a pitcher strikes out or walks batters per plate appearance (PA). You can calculate them by dividing the total number of strikeouts or walks a pitcher issued between the plate appearances batters got against him during a period of time (a week, month, season, etc.). Then, for the purpose of getting K%-BB%, you just subtract them and that’s it.

(k-bb)/ip goes similarly but you subtract strikeouts minus base on balls first and then divide the result between innings pitched. The reason for this (and comparably for dividing between plate appearances in K%-BB%) is to obtain ratios or proportions that allow us to compare pitchers who have faced vastly different quantities of batters.

If interested, you can find some more info on these stats here. By the way, Bill James was not a fan of them about 10 years ago, but Tom Tango was. What intrigues me the most about the stats is if they could be used to anticipate how a pitcher will fare afterward, their predictability, so I am going to do some calculations to bring light on that. Some folks have done some writing about this but their conclusions are not that clear to me, so, I did some research of my own.

First, I pulled data for pitchers during the 2019 season: main interest is on K%-BB% and (k-bb)/ip of course and also ERA and some ERA estimators like xFIP and SIERA. Note that I am including as a consistency check another estimator called CSW, which in 2019 was already proven to have a great correlation with SIERA and could be very especially useful as a prediction stat.

Pitchers were restricted to those who had at least 23 or more games started as I wanted starters with a lot of innings pitched for a bigger sample.

2019 Pitching data.

NameCSW(k-bb)/ipK%-BB%ERAFIPxFIPSIERAGSIP
Gerrit Cole0.35663301.31070250.34000002.502.642.482.6233212.1
Chris Sale0.34266021.23045550.29500004.403.392.933.0025147.1
Max Scherzer0.34259931.22022080.30300002.922.452.882.9327172.1
Justin Verlander0.34019721.15695070.30400002.583.273.182.9534223
Jacob deGrom0.30603581.03431370.26200002.432.673.113.2932204
Shane Bieber0.33103241.02288650.25500003.283.323.233.3633214.1
Matthew Boyd0.31719051.01566720.23800004.564.323.883.6132185.1
Blake Snell0.33950291.00000000.24200004.293.323.313.5623107
Walker Buehler0.31430580.97748490.24200003.263.013.373.5030182.1
Yu Darvish0.32057580.97081930.23600003.984.183.393.5531178.2
Lucas Giolito0.32622600.97048810.24200003.413.433.663.5729176.2
Charlie Morton0.32685570.94232750.23200003.052.813.283.5433194.2
Stephen Strasburg0.32210400.93301440.23200003.323.253.173.4933209
Lance Lynn0.29074020.89860640.21400003.673.133.853.8333208.1
Jack Flaherty0.31204780.89750130.22800002.753.463.643.6833196.1
James Paxton0.29793620.87217040.20700003.823.864.033.9329150.2
Chris Paddack0.30087720.87018540.21400003.333.954.053.8326140.2
Robbie Ray0.32082250.86731760.20300004.344.293.764.0233174.1
Patrick Corbin0.29918160.83168320.20100003.253.493.593.8833202
Clayton Kershaw0.29902690.83099380.21000003.033.863.503.7728178.1
German Marquez0.30428130.80459770.19400004.764.063.543.8528174
Trevor Bauer0.30865200.80281690.18800004.484.344.334.1434213
Domingo German0.30618600.79720280.19200004.034.724.224.0624143
Jake Odorizzi0.28094730.78616350.19000003.513.364.334.1430159
Sonny Gray0.30663460.78241010.19400002.873.423.653.9731175.1
Luis Castillo0.31000630.77287070.18800003.403.703.483.9532190.2
Madison Bumgarner0.28729960.77220080.19000003.903.904.314.1534207.2
Noah Syndergaard0.30403880.77079110.18400004.283.603.834.0232197.2
Kenta Maeda0.32511300.77023500.18900004.043.954.044.0626153.2
Michael Pineda0.28489900.76712330.18600004.014.024.304.1826146
Hyun-Jin Ryu0.29083520.76289790.19200002.323.103.323.7729182.2
Max Fried0.29400750.76271190.17900004.023.723.323.8330165.2
Zack Greinke0.29778350.75408260.19400002.933.223.743.9633208.2
Zack Wheeler0.27920230.74320860.17600003.963.484.064.2031195.1
Vince Velasquez0.26933960.74295470.16900004.915.214.754.3623117.1
Chris Archer0.30695900.73825500.16700005.195.024.364.3823119.2
Aaron Nola0.32322930.73725880.17500003.874.033.824.1434202.1
Tyler Mahle0.30529170.73529410.17100005.144.663.994.1625129.2
Jose Berrios0.29757340.71964020.17100003.683.854.324.2832200.1
Anthony DeSclafani0.27361320.70998800.17000003.894.434.304.2931166.2
Caleb Smith0.28673430.70542130.16700004.525.115.054.5828153.1
Joe Musgrove0.28716600.69370960.16500004.443.824.314.3131170.1
Eduardo Rodriguez0.28869810.67946820.16100003.813.864.104.3134203.1
Kyle Hendricks0.29725720.66666670.16200003.463.614.264.3830177
Jon Lester0.26454160.66004670.14800004.464.264.354.4931171.2
Chris Bassitt0.27695470.65277780.15300003.814.404.614.4725144
Kyle Gibson0.28509410.65000000.14800004.844.263.804.2529160
Ryne Stanek0.29615380.64935060.15300003.974.284.724.472777
Tanner Roark0.25334220.64809210.14800004.354.674.644.5731165.1
Jordan Lyles0.27728010.64539010.15200004.154.644.614.5328141
Dylan Bundy0.29877610.64516130.14800004.794.734.584.5430161.2
Mike Minor0.29422280.63431040.15300003.594.254.604.5132208.1
Steven Matz0.28386380.63085570.14600004.214.604.334.4730160.1
Jon Gray0.28908550.62666670.14800003.844.063.894.3525150
Joey Lucchesi0.28982220.62500000.14800004.184.174.364.4830163.2
Jose Quintana0.27388540.61988300.14200004.683.804.204.5031171
Cole Hamels0.29030910.61614730.14100003.814.094.384.5527141.2
Miles Mikolas0.27729330.60869570.14700004.164.274.184.3932184
Jakob Junis0.28229670.60536840.13800005.244.824.634.6331175.1
Daniel Norris0.28208390.60374740.14300004.494.614.554.5729144.1
Masahiro Tanaka0.28597060.59890110.14300004.454.274.294.4631182
Homer Bailey0.27856140.58859600.13800004.574.114.434.6031163.1
Spencer Turnbull0.25892520.58744090.13300004.613.994.634.6230148.1
Eric Lauer0.26733070.58310990.13400004.454.234.774.7229149.2
Mike Soroka0.27688700.57979330.14400002.683.453.854.2829174.2
Trent Thornton0.27618700.57105780.13000004.844.594.944.8029154.1
Reynaldo Lopez0.27853300.56521740.12900005.385.045.274.8833184
J.A. Happ0.25964390.56486650.13500004.915.224.784.7230161.1
Rick Porcello0.26486490.56289490.12700005.524.765.144.8632174.1
Merrill Kelly0.27413910.55161110.13000004.424.514.584.7332183.1
Marcus Stroman0.27797830.54861490.13000003.223.723.994.4132184.1
John Means0.25139870.53548390.13000003.604.415.485.0227155
Jhoulys Chacin0.27352190.53346270.11700006.015.885.034.9424103.1
Chase Anderson0.26588630.53237410.12500004.214.835.264.8927139
Danny Duffy0.27579820.52995390.12400004.344.785.144.8923130.2
Trevor Richards0.27860260.52553660.12200004.064.515.094.8923135.1
Adam Wainwright0.28129350.51985980.11900004.194.364.394.7031171.2
Jordan Zimmermann0.27652730.50892860.11300006.914.794.874.9323112
Mike Leake0.26789920.50761420.12000004.295.194.764.7932197
Jeff Samardzija0.25182970.50248480.12300003.524.595.024.9232181.1
Zach Eflin0.27460130.49662780.11500004.134.854.764.8628163.1
Trevor Williams0.25999140.47520660.10900005.385.125.255.0826145.2
Wade Miley0.26212940.47277080.10900003.984.514.524.8033167.1
Anibal Sanchez0.26542750.45783130.10600003.854.445.105.0730166
Julio Teheran0.26745720.45350170.10500003.814.665.265.1133174.2
Marco Gonzales0.27289600.44827590.10500003.994.155.115.0834203
Jake Arrieta0.27144160.43639050.09900004.644.894.464.8224135.2
Martin Perez0.26370850.41187160.09200005.124.664.695.0129165.1
Yusei Kikuchi0.25836090.40942930.09200005.465.715.185.1732161.2
Jason Vargas0.27664670.40884720.09400004.514.765.445.2529149.2
Mike Fiers0.26560880.39630840.09700003.904.975.195.1933184.2
Michael Wacha0.24602090.38827260.08700004.765.614.805.0824126.2
Aaron Sanchez0.26724510.35850500.07800005.895.255.155.2827131.1
Ivan Nova0.23826960.35828880.08300004.724.984.915.1634187
Sandy Alcantara0.26658040.35514970.08300003.884.555.175.2832197.1
Andrew Cashner0.25846280.33333330.07900004.684.665.115.2023150
Zach Davies0.24850750.32035180.07600003.554.565.205.4331159.2
Brad Keller0.25083000.31496060.07300004.194.354.945.2328165.1
Glenn Sparkman0.23692310.29411760.06600006.025.935.815.5923136
Dakota Hudson0.26544940.28702640.06600003.354.934.555.0832174.2
Brett Anderson0.24595710.23295450.05500003.894.574.795.1731176
Antonio Senzatela0.23290120.15297910.03300006.715.445.125.5025124.2

SIERA is almost universally acclaimed as one of the best ERA estimators and initially, I want to check what’s the correlation between the K-Bs stats and it. First, I created graphs for the relationship between ERA, xFIP, and SIERA with the K-Bs and plotted the trend line, its equation, and R2 for each:

(k-bb)/ip Vs Era estimators’ correlation (2019 season)
K%-BB% Vs Era estimators’ correlation (2019 season)

As expected, due to the impact of Ks and BBs in its formula, SIERA has the highest R2, which is great as it indicates the lowest variance and that leads to the thinking that these simpler stats can be used with equivalent results to SIERA. But if I really want to take it a step further, I should be comparing 2018 K-Bs with 2019 SIERA and check for correlation to try to find out any real predictability.

Pulling the data and graphing it we get:

Promising.

At first, we could only interpret that, as the values of R2 show, the concurrence in 2019 SIERA due to 2018 (k-bb)/ip or 2018 K%-BB% is 44% and 48% respectively and that could be too low to be meaningful. With that thinking, we would be right and wrong about it at the same time, being the word variance the deal breaker here.

As Phil Birnbaum graciously explains in this post, most of the time R2 tells a lot from a statistician point of view and its value is important but, in layperson terms, the coefficient of correlation R (the square root of R2) is more informative and useful. R tells us that correlation is around 66% and 70% for (k-bb)/ip and K%-BB% respectively, which is good and potentially indicates that these stats can be used to predict a pitcher’s performance.

There is a lot more data checking that I should do, in fact using more seasons would help as the sample will be larger, but this small exercise strongly shows that the K-Bs are good tools to make educated estimates on future pitchers performances. It might feel like reinventing the wheel but it would surprise you how much people tend to underestimate these stats, so any boost to our confidence in them is important.

So, what can we do with K%-BB% and (k-bb)/ip that leads to useful and practical information? Well, I dedicated an article to the evaluation of the 50 best pitchers coming into the 2020 season according to these stats from 2019 and there are a few surprises that I think you should consider. Spoiler alert: you should try to get Boyd, and Kevin Gausman might be in for a very decent season.

Baseball is such a wonderful game that, besides the joys of simply watching it, it has gifted us with the joy of measuring and analyzing it ad infinitum. It can be, and it is, measured beyond silliness which can be a blessing or a curse as separating the needles from the hay seems, a lot of times, complicated. But it doesn’t have to be; the beauty of having so much information available is that if we can understand it, diverse ways of looking at our beloved game increase our appreciation of it: if knowledge is power, applied knowledge is wisdom. Let’s try to be wise or die trying.


All data used was taken from https://www.fangraphs.com/, https://baseballsavant.mlb.com/, and/or https://www.baseball-reference.com/, unless otherwise stated different.

11 thoughts on “Revisiting Ks and BBs.”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.