The Kwindex

This post is also available in / Disponible en: : enEnglishesEspañol (Spanish)

Every endeavor has its ultimate goal: some call it the finish line, the holy grail, the epitome, etc. I like to use what is called KISS: Keep It Simple, Stupid.

I love data, all kinds of it. I like to go through rows and rows of numbers in multiple tabs in multiple spreadsheets while trying to make sense of them. No problem at all. But here is the thing: most people hate numbers, algebra, and math in general. They just don’t like it, and that’s fine.

While putting these posts together, this particular endeavor, I usually find myself in the position of trying to wear the shoes of someone who doesn’t like numbers and I kind of got used to it. Not that I now hate numbers, that’s not the case, but instead, sometimes I would like to “show” or “see” the least of them while the “magic” (in this case, the math) happens behind the scenes. Sometimes I want concise and reliable info; more meat, fewer potatoes (although I will still post a lot of numbers, it’s in my nature).

While doing pitching analysis, we are used to swiping through a myriad of stats, comparing, relating, correlating, regressing, and such as we have the feeling that this will lead to a better result. It is understandable and usually true but it can also be exhausting.

That’s why in the past, I’ve been very eager in trying to corroborate that simpler indicators can work as good or better than others that are harder to calculate or as good or better than looking at 10 different stats and trying to make sense of them as a whole.

I’ve had success in using very simple stats as (k-bb)/ip and K%-BB% as good estimators of a pitcher’s future performance; they have provided appropriate guidance. But we know they are not an all around solution, there is not such a thing.

But we can always keep trying.

So in line with that, I wanted to have an indicator, a meta number of sorts, which could be a summary of a few others and work as a quick first sign of what is lying ahead in terms of pitching performance.

Don’t we have other stats for this kind of thing? Sure, and we must continue using them; I’m just trying to simplify part of the work without relegating or distorting the info.

Also, please let’s focus on “first sign”: this is not by any means a complete and catch-it-all solution, it is just a first step just meant to open the door; you still have to keep walking to go beyond.

The Kwindex is an aggregate index composed basically of other stats: (k-bb)/ip, CSW, pCRA, Zone% and F-Strike%. Why these? Well, I have been using them for some time by themselves, relying on my knowledge and instinct to gauge and decide which was more important in each case; that’s not wrong per se, but we know how subjective we humans are, our subconscious biases define our decisions more often than not, so I wanted to find a way I could bypass that process; the Kwindex is the result.

Also, these stats provide a balance between dominance by power (basically Ks) and dominance by control and location (Zone% and F-Strike%) with the ERA estimator in between.

This is a work in progress but it’s a start nevertheless, let’s jump in and see what does it has to offer.

Below you will find the pitchers with at least 16 IP (20 games by 0.8) and with a Kwindex higher than the average for this group, 49%:

The data is updated to include the games until the 17th of August.

Apart from several already proven pitchers, cases such as the likes of Gausman, Civale, Milone, Lindblom, Toussaint and the resurgence of Luis Castillo stand out. It confirms that Dylan Bundy is for real, too.

What do we do with this info? Well, for one part, the higher the Kwindex the higher the probability of continued success for that pitcher. There, it is nicely reassuring to see that guys like Bundy and Gausman are not necessarily over performing and could sustain their early success.

On the other hand, I’m also interested in the pitchers NOT appearing in the list, being Randy Dobnack the most prominent name. Also, Zack Greinke and Max Fried seem to be too low so that calls to be cautious about them.

I will post a weekly leaderboard for the Kwindex and we can watch how does it works (or not) from now on.


pCRA data was taken from this spreadsheet, maintained by its creator Connor Kurcon.

All other data was taken from https://www.fangraphs.com/, https://baseballsavant.mlb.com/, and/or https://www.baseball-reference.com/, unless otherwise stated.

7 thoughts on “The Kwindex”

  1. Nice work. Wondering how much you’re weighting Zone% and F-Strike%. What do you think of this research from PitchersList? https://www.pitcherlist.com/a-beginners-guide-to-understanding-plate-discipline-metrics-for-pitchers/

    “For walk rates, none of the metrics show a high enough correlation to make such rules. Walks are unpredictable in nature, thanks to human umpires. O-Swing% and F-Strike% are certainly good, but not really guaranteed to improve walk rates. Throwing balls in the strike zone does not even appear to be a factor. For walks the only thing we can predict is their unpredictability.”

    Seems counter-intuitive, I know.

    Reply
    • Hi Daniel,

      First than anything, thanks for taking the time to comment and for your kind words, I really appreciate them.

      I’m glad you brought these topics as I was expecting them to arise at any moment. Regarding the weighting for Zone% and F-Strike% they barely represent 20% of the index, jointly, and from that Zone% is just 5%. The reason for this is that while I know they don’t have the strongest correlation (and/or the smallest variance for K%) among the Plate Discipline stats, F-Strike% as stated in that wonderful Pitcherlist article brings to the table influence over the BB% which I find important; Chaz shows that for the data he used, R2 is 0.4098 which is a 64% of correlation, that’s nontrivial and is the best we get for walk rates among these PD stats, at least with the simpler ones.

      Zone%, I have to admit, is a “container” stat; its purpose is to try to fill the gap between power and control that, for example, would keep guys like Marco Gonzales, Framber Valdez or Chris Paddack out of consideration even when they are having some kind of success. But by no means I consider it the perfect solution; I’m still tweaking and researching and in the future iterations I don’t mind using another type of stat that yields better results.

      I have a small disagreement with Chaz related to the unpredictability of walk rates, though. Yes, there is a huge subjective element to it, no question on that. But, and that’s a big but (no pun intended), data show they stabilize and that’s important. From some important research done by the amazing Max Freeze at https://www.freezestats.com/2019/04/looking-at-sample-sizes-for-fantasy-baseball/, we know that after around 41.5 IP BB% starts to stabilize and that’s what makes it useful in this case.

      On a side but relevant note, I still think that (k-bb)/ip (or K%-BB%) and CSW are some of the simplest and best estimators of future pitching performance (you might want to check this out: https://baseball.iskewl.com/wp/blog/2020/07/31/the-search-for-holds-and-ks/), I love their simplicity and quality of results but I also know they are far from perfect. What I’m trying to achieve with Kwindex is to take a step forward by incorporating some elements that will include pitchers that are not necessarily elite striking out batters but get the job done consistently; all of this while making it as easy as possible to do without relying on some of the more complex metrics available.

      Thanks again for your comments, they bring a much-needed discussion which will for sure help me in the process of improving the metrics!

      Regards,

      Carlos.

      Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.