I am a big fan of sabermetrics. I joined SABR in the ’90′s, before “sabermetrics” was even a word. I read a borrowed copy of Moneyball in Iraq in 2005 and, just like at my first Phish show, I saw everything in a whole new light. I loved the book for what it represented – the triumph of rational contrarians over arrogant jocks, even in their own realm. I also felt personally vindicated (during my playing days I strived to prove that cerebral play, and drawing walks, could make up for a lack of athleticism; alas I never really fit in with my jock teammates or my high school coach and I “retired” at 16). The gods of my childhood were players, but in my early 20′s my idols included Billy Beane and Bill James.
However, I’ve grown disillusioned. (I’m not the only one; John Sickels recently stirred up quite the cyber-shitstorm.) In my last Moneyball-centric rumination, I compared sabermetrics in 2010 to Wall Street in 2008 – so many people so supremely confident in their approach, designing ever-more-complex prediction models, their bullishness feeding off each other’s, blinding them to their errors. Many people (myself included) can just get lost in Fangraphs or BP and not emerge for hours. I think we heartily want to be part of this movement (especially if not being part of it means siding with Joe Morgan and Murray Chass) but, as with the funds we so readily invested in, we don’t really understand what’s going on.
At first the issue was transparency – I understand the concepts of WAR, UZR, and WPA, but I have never seen the formulae. I don’t think anyone has with the exception of a handful of oligarchs at the top of the sabermetric party. (Update: A spreadsheet for one of several discordant versions of WAR is available here.) I have read some very thorough (and enlightening) explanations of how they are calculated, but ultimately I must rely on “the computer.” Scientists in other fields are, by necessity, much more willing to share their methodology with their peers for their review. Several years ago I emailed some leading bloggers, including Tango and the guys at FJM, to inquire how OPS+ is calculated (particularly the ballpark effects; I wanted to compare some hitters in the Can-Am League). I figured they would be happy to evangelize another curious, like-minded fellow. But those who responded seemed haughty and offended. (Update: To clarify, it was the FJM guys in particular who responded thusly.) “You do know what park effects are, don’t you?” they mocked. Well, no. Sure, I can see that 3.5% more runs were scored in Mudville last year than in the Nine’s road games. I see that 2.2% more homers were hit there. How exactly that affects Casey’s OPS+ only Mazatec shamans and astrologers can divine.
My grievance now is the departure of sabermetrics from its very soul. Bill James began, more than thirty years ago, by daring to ask why certain things were done the way they’d always been done. Nowadays, to question the tenets of sabermetrics is to effectively label oneself a Neanderthal, a half-wit, a science-hating agent of the Roman Inquisition. I say it’s time to stop this runaway train, go back to the station and re-examine its structural integrity. I will ride no further until I get some answers! (To be fair, not all of these questions are directed at the sabermetric community. Some are aimed at the “inside” baseball establishment. I consider both to be arms of The Church, and so I’d nail these babies on the church in Wittenberg… or Oakland or Boston.)
1. Why are we bestowing OPS with an aura of universal verity? Why are we still using it at all? OPS is something that the earliest Moneyball pioneers threw together because they recognized that these were the two most important offensive metrics. We now know that On-Base Percentage should be weighted about 1.7 times more than Slugging (to achieve the strongest correlation with actual runs scored), yet we continue to simply add them together. Once OPS becomes entrenched in the language of the game, it will be very difficult to alter.
2. Why can’t there be a left-handed shortstop? Yes, he’d have to turn his body to make the routine throw to first. But he would not have to turn as he comes across the bag on a double play, when the milliseconds saved are more likely to make a difference. When starting a double play, the throw is short enough that he would not have to turn his body; he could feed the second baseman the same way a right-handed second baseman would feed him. In non-DP situations, he could play a step or two toward third - allowing him to backhand balls hit toward the middle, plant, and make the throw (whereas a righty has to spin around, and is rarely able to make such plays).
3. Why don’t we incorporate base stealing in OPS (or the weighted OPS proposed in #1). It would be very simple to add SB to total bases (effectively turning singles into doubles), and subtract CS from times on base (erasing a walk/hit into an out). This would make the effects of stolen base attempts much more evident, and would do so more accurately than the traditional 75% break-even point. Is a player who steals 50 bases at a 75% success rate more valuable than one who swipes 20 at an 85% rate? How about a guy who swipes 2 at a 100% success rate? We could make a more educated guess, and the slash stats would be a more complete representation of a player’s offensive contributions, if they reflected stealing in such a way. (I know stats like runs created and EqA do take this into account, but they are far too convoluted to ever become mainstream.)
4. Why aren’t there more “two-way” baseball players? Even at the very highest levels of amateur ball, many pitchers double as position players. I understand the need for specialization, the theory that pitching or hitting at the major league level requires 100% of a guy’s focus. But Brooks Kieschnick and Rick Ankiel have shown that it’s certainly not impossible to do both if you’re blessed with enough talent. Imagine if switch hitters were told in the minors to choose one side to focus on. Why is such potentially valuable versatility discouraged throughout the industry, by forcing draft picks down one path or the other?
5. Is a strikeout by a batter more detrimental to his team than other types of outs, or isn’t it? For a pitcher we believe that a strikeout is the best possible outcome (except double plays) because he has almost no control over the BABIP against him. We typically treat BABIP for hitters also as if it were largely determined by luck. If that’s the case, then why are guys like Juan Pierre, who put the ball in play, so vilified by sabermetricians, while whiffers like Jack Cust are icons? (Granted, Cust’s OBP is significantly higher than Pierre’s, but it would be higher still if he were to combine his excellent plate discipline with a little old-school choking up with two strikes.) On the other hand, Edgar Martinez and Bobby Bonilla had the exact same number of at-bats (7213) and put the ball in play almost the same number of times (Edgar struck out 1202 times, Bonilla 1204). Their ISO is also similar (.204 to .193). Yet Edgar’s career BA is 33 points higher (.312 to .279), suggesting either an astronomical amount of luck or an ability to maintain an above-average BABIP. What I’m getting at is that I sense a paradox in the current sabermetric thinking on this question. Either BABIP is random (in which case punchouts are devastating instances of lost opportunity which should be avoided like a plague) or it is determined by a particular skill like a consistent ability to hit more line drives (in which case we can no longer expect abnormalities to regress toward the mean). I don’t know which of these viewpoints is correct, but I know they cannot both be.
6. Why do we have a run expectancy matrix showing the average number of runs scored from each base-out situation, but not one (that I can find) showing the probability of scoring at least one (or two or whatever may be needed in a particular scenario)? This disconnect explains why saberites so obstinately despise the sacrifice bunt, and also, I think, explains why we’ve been criticized for failing to ”actually watch a game.” The underlying sabermetric ethos has always been to examine questions in terms of runs scored or prevented, since that is what wins games. Why not carry that logic one step further and just think in terms of wins and losses? Crazy as it may sound, striving to maximize your run differential is not always the best strategy. Pennants are not won by Pythagorean records. Within the confines of an individual game, more conservative tactics (like bunting) are often the surer path to victory. (Update: Tangotiger led to me to the matrix I was seeking.)
7. Why doesn’t sabermetrics devote more study to the process, rather than just the results? An example would be the controversy surrounding maple bats. Carlos Pena, a maple-swinger, said “It feels harder to me. And if I was to put a formula on it, I’d want the hardest wood possible, the one with the least amount of give. That’s just straight physics.” His teammate Evan Longoria remains skeptical: “I think with a lot of guys it’s more of a mental thing than a physical difference.” This would be the easiest thing in the world to test; probably the answer is already out there, yet MLB has been investigating for almost two years. It’s true that maple is “harder” than ash. Like Pena my background is in engineering, not physics, but it seems to me that the compression of the softer ash when it meets the ball, and the springing back effect that occurs immediately thereafter, would cause the ball to “jump” off the bat with more power. But does either wood confer any real, significant benefit to a hitter? To answer this question would take sabermetrics out of its comfort zone because you can’t pull this information from Retrosheet. You would probably have to ask a number of players which type they use, if and when they switched. Admittedly, there would be some difficulties. But any scientist should be disappointed to let such a glaring and pertinent question remain so unsettled.
8. What’s the deal with Win Probability Added? It’s a delicious little garnish to the traditional game summaries, but is there any substance behind it at all? Any metric that gives a team, down a run in the ninth with the bottom of its order facing a dominant closer, the same chance to win as another team, down a run in the ninth with three Silver Sluggers facing some bush-league junkballer, is obviously lacking.
9. Why the mistrust and disdain of observations that contradict statistics? Numbers, like words, are attempts to illuminate the truth; they themselves are not the truth and they can never fully encapsulate it. As a surveyor I learned that my eyes are easily fooled; a straight line can appear crooked and vice-versa, but the instrument never lies. However there were times when the instrument spit out a reading so preposterous that I knew I must have accidentally bumped it off-level. The Twins-Phillies game yesterday in Clearwater provided a nugget of truth, obvious to even the most untrained of the 20,000 eyes that saw it, but invisible to the rest of humanity because the auspicious circumstances kept it out of posterity’s official play-by-play. Matt Tolbert dropped a routine pop-up, which could happen to anyone on such a windy day, but got the putout anyway because the infield fly rule had been called. He dropped another one in the next inning, but again was fortunate enough to do so with a runner on first, and so was able to salvage the play as a fielder’s choice. His UZR actually improved on those two plays, but Ron Gardenhire will see right through that. When a third pop-up began its descent toward Tolbert’s glove, Alexi Casilla raced over from short and snatched it.
These are the questions that have arisen from the primordial soup that was my brain this winter. I’m sure that one or two of them emerged only half-baked and I am fully prepared for the due criticism to follow. My larger goal here is to bring the two factions (proponents of sabermetrics and of scouting, people living in basements and people like Don Zimmer) together in the hope that their mutual enmity can evolve into mutual respect. For while I still love sabermetrics and its noble pursuit of truth, I am seriously concerned that some of its building blocks may not be as strong as they were thought to be. In case they do buckle, I urge all sabermetricians to stop being so self-righteous. Even if the foundations of sabermetrics are strong, its essential objectivity can only see so much. Like all collisions between science and art, the relationship between the two factions will remain tense until both realize that neither has a monopoly on the truth, and that the other has the complementary piece of the puzzle.