Stephen Jay Gould claims standard deviations are shrinking as competition in baseball becomes more uniformly excellent. This graph appears to begin (although in the 2 most critical statistics) to refute that. The Standard Deviation of ERAs of pitchers with more than 100 IP in a year in a league, expressed as a percentage of the league ERA, looks pretty damned flat, within an expected variance over time.

Same for OPS of batters w/ more than 400 AB in a year in a league. Why batters are more varied than pitchers probably relates to the existence of the defensive specialist, to which there is no analog in the pitching world. Dividing the SD by the league value normalizes it, so the year-to-year increases and decreases in the underlying stat are divorced from these data series.


The SD of Winning Pct. is distinctly shrinking as teams get filled with more uniformly good players.

Parity is, and always has been, increasing. This graph shows the standard deviation of team winning %ages for each year, counting all "major" leagues. Do remember, though, that pre-1901, seasons were significantly shorter, and talent was spread around more leagues, leading to higher SDs.


SD(Lg OPS) to SD(Lg ERA) is practically uncorrelated. There is a slight negative correlation, which if it means anything, means that sometimes pitchers dominate--squishing the range of offensive results--and sometimes it is reversed.

Looking at the data without being zoomed in emphasizes how flat it really is within the range of possibilities. There is something fundamental keeping SD of OPS around 15% and SD of ERA around 20%.

I actually only glanced through his work a few months ago, and so am not qualified to argue its veracity. But as near as I can tell, he concentrated on the disappearance of the .400 hitter. If you look at the ML average batting average, and the max and min average of any regular in that year, then those differences are indeed shrinking over time.

But the better explanation for this may simply be the decreased emphasis on mere base hits as hitters try for extra bases and pitchers try for strikeouts (leading to more walks). Look at my chart on offense.

People today speak of having an "empty" batting average the way you might speak of cola having empty calories. A .400 hitter with few walks or HRs would be interesting, but would definitely not be paid as well as a .270 hitter with lots of walks and HRs. Why do you suppose nobody holds their hands apart on the bat and tries to "hit 'em where they ain't" anymore? That's why I will focus on OPS when discussing offense, as it better represents what hitters are trying to do. (create runs and win ballgames)

Also, I went ahead and did this calculation myself:

This looks at max/min candidates with at least 400 AB, but the league average is the batting average of the (major) league, not the average of these candidates. Not sure what Gould did. I really ought to read that again. I think he did look not at the extreme values, but at the average of the 5 most extreme values each for top and bottom.


So let's do that for OPS.

Kind of a killing blow. Clearly, there is no shrinkage of (avg(top5)-lg) or (avg(bot5)-lg) over time in OPS. Batting average just worked because of the changing nature of great hitting away from safe hits, and toward more productive run-creation.

I do not know why the series for the two leagues appear to move together. The performance of 5 good/bad hitters in one league should not be related to their counterparts in the other league, for any reason I can see. Again, if you have a theory, let me know.


... Also, he did not have the luxury of Mr. Lahman's database, and this was probably an asset as he could make a more reasoned judgment of who was a "regular" than simply saying 400 ABs, which is inideal since the season has been growing longer over time. (and this shrinks SDs of older and shorter seasons since a slightly better class of player would get 400 AB in 154 games than in 162. Ignoring pre-1901 data makes my life much easier, since the seasons varied widely from 60-130 games.)

True that extreme events used to be more extreme, and it seems reasonable that it's explainable from increased uniformity of excellent competition. Especially as the color barrier fell and worldwide scouting grew more widespread. Also, if you read histories of early baseball, the path to the major leagues had huge chunks of luck in it. Now, it seems unlikely that there are too many people capable of playing major league baseball well that are not doing just that. Furthermore, pay is much, much higher than it used to be. Talented people very literally gave up promising baseball careers because they couldn't make a living at it. Obviously, that doesn't happen much now. Even competition from other major league sports is not drawing off too many athletes right now. I present these facts not in some kind of orchestrated attack on Gould's position, but simply as a discussion.


I've started analyzing super batting seasons, and there are way too many of them! The assumption of normal distribution breaks down a teeny bit above the level of OPS = 1.000. Again, we are starting at 1901 and looking at seasons of American and National League batters with more than 400 AB.

The tail end of the histogram bin chart looks like:

1

1.025

1.05

1.075

1.1

1.125

1.15

1.175

1.2

1.225

1.25

1.275

1.3

1.325

1.35

1.375

More

111

67

66

35

26

17

12

11

6

4

2

4

1

1

0

1

1

Whereas, it would be expected to have more or less no entries above 1.100. Clearly, Babe Ruth was an alien. Possibly Ted Williams as well.

A more proper look returns to the concept of a hitter's # of standard deviations above lg. OPS for that year. The number of hitters achieving 2 or more SDs above lg. is consistently way higher than would come from a perfect normal distribution

Of the 10,000 odd seasons studied, the most extremely good is Babe Ruth's 1921, where his 1.358 mark was 4.754 SDs above the lg. .765. In a normal distribution, 4.754 is almost exactly a one in a million event. That would be very surprising if the Babe hadn't had a 1 in 633,000 year the year before! So anyway, the top-end tail of the OPS histogram is goofy with superstars. I'll present some straightforward facts about these great years soon, and also try to research some more exotic techniques for dealing with such distributions.

I'm ignoring stupendously bad seasons for now since the low-end tail of the histogram looks fine and there is extreme selection pressure against bad hitters.


Me, goofing around with more basic statistics

Me, goofing around with statistics on age and $

home

mail