Stephen Jay Gould claims standard deviations are shrinking as competition in baseball becomes more uniformly excellent. This graph appears to begin (although in the 2 most critical statistics) to refute that. The Standard Deviation of ERAs of pitchers with more than 100 IP in a year in a league, expressed as a percentage of the league ERA, looks pretty damned flat, within an expected variance over time.
Same for OPS of batters w/ more than 400 AB in a year in a league. Why batters are more varied than pitchers probably relates to the existence of the defensive specialist, to which there is no analog in the pitching world. Dividing the SD by the league value normalizes it, so the year-to-year increases and decreases in the underlying stat are divorced from these data series.
The SD of Winning Pct. is distinctly
shrinking as teams get filled with more uniformly good players.
Parity is, and always has been, increasing.
This graph shows the standard deviation of team winning %ages for each year,
counting all "major" leagues. Do remember, though, that pre-1901, seasons
were significantly shorter, and talent was spread around more leagues,
leading to higher SDs. SD(Lg OPS) to SD(Lg ERA) is practically uncorrelated. There is a slight
negative correlation, which if it means anything, means that sometimes pitchers
dominate--squishing the range of offensive results--and sometimes it is
reversed.
Looking at the data without being zoomed in emphasizes how flat it really
is within the range of possibilities. There is something fundamental
keeping SD of OPS around 15% and SD of ERA around 20%.
I actually only glanced through his work a few months ago, and so am not
qualified to argue its veracity. But as near as I can tell, he concentrated
on the disappearance of the .400 hitter. If you look at the ML average
batting average, and the max and min average of any regular in that year,
then those differences are indeed shrinking over time.
But the better explanation for this may simply be the decreased emphasis
on mere base hits as hitters try for extra bases and pitchers try for
strikeouts (leading to more walks). Look at my
chart on offense.
People today speak of having an "empty" batting average the way you
might speak of cola having empty calories. A .400 hitter with few walks or
HRs would be interesting, but would definitely not be paid as well as a
.270 hitter with lots of walks and HRs. Why do you suppose nobody holds
their hands apart on the bat and tries to "hit 'em where they ain't" anymore?
That's why I will focus on OPS when discussing offense, as it better
represents what hitters are trying to do. (create runs and win ballgames)
Also, I went ahead and did this calculation myself:
This looks at max/min candidates with at least 400 AB, but the league
average is the batting average of the (major) league, not the average of these
candidates. Not sure what Gould did. I really ought to read that again.
I think he did look not at the extreme values, but at the average of the 5
most extreme values each for top and bottom.
So let's do that for OPS.
Kind of a killing blow. Clearly, there is no
shrinkage of (avg(top5)-lg) or (avg(bot5)-lg) over time in OPS. Batting
average just worked because of the changing nature of great hitting away from
safe hits, and toward more productive run-creation.
I do not know why the series for the two leagues appear to move together.
The performance of 5 good/bad hitters in one league should not be related to
their counterparts in the other league, for any reason I can see. Again, if
you have a theory, let me know.
True that extreme events used to be more extreme, and it seems reasonable
that it's explainable from increased uniformity of excellent competition.
Especially as the color barrier fell and worldwide scouting grew more
widespread. Also, if you read histories of early baseball, the path to the
major leagues had huge chunks of luck in it. Now, it seems unlikely that
there are too many people capable of playing major league baseball well that
are not doing just that. Furthermore, pay is much, much higher than it used
to be. Talented people very literally gave up promising baseball careers
because they couldn't make a living at it. Obviously, that doesn't happen much
now. Even competition from other major league sports is not drawing off too
many athletes right now. I present these facts not in some kind of
orchestrated attack on Gould's position, but simply as a discussion.
The tail end of the histogram bin chart looks like:
...
Also, he did not have the luxury of Mr. Lahman's database, and this was
probably an asset as he could make a more reasoned judgment of who was a
"regular" than simply saying 400 ABs, which is inideal since the season has
been growing longer over time. (and this shrinks SDs of older and shorter
seasons since a slightly better class of player would get 400 AB in 154 games
than in 162. Ignoring pre-1901 data makes my life much easier, since the
seasons varied widely from 60-130 games.)
I've started analyzing super batting seasons, and there are way too many
of them! The assumption of normal distribution breaks down a teeny bit above
the level of OPS = 1.000. Again, we are starting at 1901 and looking at
seasons of American and National League batters with more than 400 AB.
|
1 |
1.025 |
1.05 |
1.075 |
1.1 |
1.125 |
1.15 |
1.175 |
1.2 |
1.225 |
1.25 |
1.275 |
1.3 |
1.325 |
1.35 |
1.375 |
More |
|
111 |
67 |
66 |
35 |
26 |
17 |
12 |
11 |
6 |
4 |
2 |
4 |
1 |
1 |
0 |
1 |
1 |
Whereas, it would be expected to have more or less no entries above 1.100. Clearly, Babe Ruth was an alien. Possibly Ted Williams as well.
A more proper look returns to the concept of a hitter's # of standard deviations above lg. OPS for that year. The number of hitters achieving 2 or more SDs above lg. is consistently way higher than would come from a perfect normal distribution
Of the 10,000 odd seasons studied, the most extremely good is Babe Ruth's 1921, where his 1.358 mark was 4.754 SDs above the lg. .765. In a normal distribution, 4.754 is almost exactly a one in a million event. That would be very surprising if the Babe hadn't had a 1 in 633,000 year the year before! So anyway, the top-end tail of the OPS histogram is goofy with superstars. I'll present some straightforward facts about these great years soon, and also try to research some more exotic techniques for dealing with such distributions.
I'm ignoring stupendously bad seasons for now since the low-end tail of the histogram looks fine and there is extreme selection pressure against bad hitters.
Me, goofing around with more basic statistics