Small Sample Sizes


Apr 11, 2014; Bronx, NY, USA; New York Yankees starting pitcher CC Sabathia (52) pitches against the Boston Red Sox during the first inning at Yankee Stadium. Mandatory Credit: Adam Hunger-USA TODAY Sports

The Yankees have played 13 games and the season (92% left to play) is still extremely young. A lot can be made about various player statistics and team performance. After 5 months without baseball this is what fans tend to do. While it is easy to say Emilio Bonifacio turned into Rod Carew or CC Sabathia is now a left-handed Sidney Ponson, the fact of the matter is that all of these fluky performances are products of small sample sizes.

Fangraphs summarized Russell Carleton’s work for when certain metrics start to stabilize. The article gives the number of plate appearances, at-bats, or balls in play when certain statistics start to become “real” rather than be a product of luck and random variance. These are good rules of thumb, but as always the more data points the better. Strikeout rate might stabilize around 60 plate appearances on average, but it is important to keep gathering data to narrow the confidence interval of a player’s true talent level for a certain measure to project future performance.

The lessons of small sample size can be applied to every team across MLB including the Yankees. Yangervis Solarte assuredly won’t continue to hit .357/.413/.500. The .417 BABIP is unsustainable and through 46 plate appearances it still remains to be seen what Solarte really is as a major league player. There just isn’t enough of a sample to go on. The flip-side of this example is Brian Roberts. The .174 BABIP is unsustainably low. On the pitching side, CC Sabathia‘s 38.5% HR/FB rate is bound to regress and the ERA will come down with it as K and BB rates (which stabilize much quicker) are currently great. The pitching counter-example is Michael Pineda and 93.8% left on-base percentage. That will certainly regress and negatively affect his runs allowed ledger.

At this point in the season we shouldn’t really be making anything of statistics. In a few weeks strikeout and walk rates will hold some weight. Ground ball and fly ball rates start to stabilize around 70 balls in play for pitchers and 80 for hitters. Carleton’s calculations show that even a full season is not enough to gauge the true talent of a player for measures like batter batting average and extra base hit rate or pitcher home run rate and BABIP. That is due to the randomness involved in baseball and what makes the sport so cool.

It’s easy to overreact to early season statistics, but as the saying goes: “Weird stuff happens in a game built around hitting a round ball with a cylindrical bat onto a 2+ acre swath of grass.”