Errors at Different Levels of OB

Commish & I were discussing the standards for official scorers giving errors. Should the same standard be applied regardless of the level, or should the standards be higher at the higher levels?

Commish made the excellent point that throwing errors (especially to first) are going to be automatic and are not really subject to any subjective standard. Since these types of errors are obviously made more frequently at the lower levels, we expect the number of errors to go up as the level goes down.

So, I can’t answer my original question with stats, but I still thought it would be interesting to look at the fielding percentages at the different levels of OB. I used 2013 stats and excluded leagues south of the border.

Screen Shot 2014-06-18 at 9.36.42 PM

The trend is clear. Actually, it’s clearer than I expected! When you get down to A ball, errors are twice as likely compared to the Bigs.

The Worst Hitting Pitchers in MLB History

Baseball Reference has a free trial for their Play Index, so I’m giving it a whirl.

Who are the worst hitting pitchers of all time? I’ve got no magic criteria, but it’s easy find some guys who were epic fails at the plate.

Rob Herbel pitched in 332 games in the 60s and early 70s, mostly for the Giants. He managed only six hits in 227 plate appearances for an anemic .029 batting average. He struck out 125 times (55% of PAs) and walked only eight times. Actually, one third of his hits were doubles, which raised his OPS to .104. I bet a few of those doubles were hit to sleeping outfielders.


Dean Chance won the AL Cy Young in 1964 and accumulated 759 plate appearances in 406 games. He recorded 44 hits (.066 BA), all but two of which were singles. He struck out 420 times (55%) and walked only 30 times. With 128 wins and a 2.92 career ERA, he’s probably the best pitcher ever who was useless with a bat in his hands.



Of the active guys, Tommy Hanson & Ben Sheets are notable. Hanson is 11 for 187 (.059) with 92 strikeouts, 5 walks, and zero extra-base hits. Sheets is 34 for 449 (.076) with 212 Ks and 19 walks.

hanson           sheets


Although Randy Tate was in the bigs for only one year, he holds the distinction of having the most career plate appearances (47) without a hit. He did manage to draw one walk, though! In six minor league seasons he hit .113, so I guess ’75 was just a down year for him. Tate had an unusually symmetric career: three years in the minors, followed by one full season with the Mets (He pitched in every month of the ’75 season.), followed by three more years in the minors. He was never called up during his minor league seasons, and wasn’t sent down during his only major league season!


And, finally, of the pitchers with the dubious distinction of never having reached base safely ever, the guy with the most plate appearances (33) is none other than Justin Verlander. I think I’ve heard that he’s a decent pitcher, though. Verlander did not reach base during his three post-season PAs, and he never went to the plate during his 20-game minor league career. Let’s hope that the increase in interleague play will give Justin the chance to get off the schneid in 2013.

2013-09-30 UPDATE Verlander got only two plate appearances during the 2013 regular season, and they both came in the 162nd game. He went hitless, but so did the rest of the Tigers, as this was Henderson Alvarez’ no-hitter!

2014-06-18 UPDATE On April 12, 2014 in San Diego, California, in the top of the second with two outs, Justin Verlander reached base safely for the first time in his professional career when he grounded a single up the middle against Ian Kennedy. When he next came to the plate in the fourth… he hit another single!!! He would later score his first run. As of today Verlander has a .069 batting average. He is still looking for that first walk.

2012 ABL Playoff Odds

I was curious about the chances playoff teams have of getting to the Bambino Cup Finals and their chances of being crowned ABL champions. The playoff structure itself has a big impact, for example, the division champions have shorter roads to the cup. Of course, the relative strength of each team is very important, but how can that be measured?

First, let’s consider the playoff structure in isolation. Assume that all playoff teams have equal strength. If that’s the case, then the chance of winning a game or a series is 50%, a coin flip. A division champ (C-Bay or Orlando) has to win two series, so they have to flip a coin twice and have it come up heads both times. That’s one chance in four, 25%. A team with a one-game showdown to get into the lower bracket (LBI or Manahawkin) has win four series. The chance of having heads come up four times in a row is only one in 16, 6.25%. The probabilities of the 2012 playoff teams winning the ABL Championship under these conditions are shown in the table below.

Now let’s look at team strength: how to measure it, and how to use it to determine the probability of winning a game and a series. Bill James applied some statistics to the question of how to measure the probability of one team beating another in one game. He called it log5, and it uses winning percentage to measure team strength. I’ll use the ABL regular-season winning percentages for this exercise.

The log5 method works for one game, but what about a best-of-five or best-of-seven series? Well, there are formulas for that too. So now we can use these formulas to calculate the probabilities of teams reaching the finals. Three pages of scratch paper later…

Quite a spread, isn’t it?

One more series of calculations (and three more sheets of paper) gives the ultimate probabilities of teams getting their name on the hardware in 2012.

A superior winning percentage sure indicates a big advantage in the playoffs. Of course, this is simply a cold calculation based on only the playoff structure and the teams’ winning percentages. Among the factors this calculation does not take into account are:

  • Home-field advantage
  • Changes in team strength due to trades & injuries
  • Runs scored & runs allowed
  • Picther/batter match-ups
  • Strength of three-man rotations
  • Sticks
  • BFHes
  • Loaded dice
  • PEDs

All-HR & no-HR games

I was watching a Cardinals game the other day—can’t remember exactly which one—and after a few innings the only runs were off solo homers. I hate games like that. I don’t mind a few taters, but small ball is more fun. It got me wondering: How many games have all their runs knocked in by homers? And how many games have no home runs? I took a guess at both numbers. You have a guess. I’ll wait.

Ready? OK, continue reading.
Continue reading All-HR & no-HR games

ABL at the All Star Break: Pythagorean winning percentage

The Pythagorean winning percentage is a measure developed by Bill James to estimate a team’s winning percentage based on runs scored and runs against. (It was named after Pythagoras, the famed Greek Sabermatrician.)

At the half-way point of the ABL regular season, the Pythagorean winning percentages are listed below. (I used the 1.83 exponent used by Baseball Reference.) The results are sorted by Pythagorean wins, the number of wins expected based on the runs scored and runs against.

picture-4.png

The difference between the actual wins and the Pythagorean wins is a measure of how “lucky” a team was. It indicates the teams that scored their runs in the situations that won games. And the teams that didn’t. Sorted by Pythagorean win difference, the table below shows the lucky teams at the top and the unlucky ones at the bottom.

picture-3.png

Consecutive steals of second & third

This has come up twice this season. A runner steals second, then, with the same batter at the plate, the runner wants to steal third. The Commish sent out a clarificaiton on 2/13/2008:

Can you steal 2nd base and before the batter swings, steal 3rd base?

ANSWER: No. If the offense attempts a jump for a steal and fails or is successful, the offense must then swing away. The offense isn’t allowed to call a hit and run, bunt, pinch hit, pinch run, steal another base, or make any other moves until the next batter. The defense is also not allowed to make a move until the next batter.

And, yet, consecutive steals of second & third do occur with the same batter at the plate. Questions: 1) How often does it happen in MLB, and 2) could that sequence of events be incorporated into ABL/TPB?

Here’s the definition of the situation. A runner steals second, then, with the same batter at the plate, steals third or is caught stealing third. (I didn’t count straight pick-offs from second.) Per the Retrosheet event files, in 2007 that situation occured 40 times. There were a total of 2,542 steals of second, so the attempt for third occurred 1.6% of the time following a steal of second. The stat for the last five seasons taken together is also 1.6% (185/11,657). So, that answers question 1. Only about one time in every 60 does a runner who has stolen second attempt to steal third with the same batter at the plate.

Does this occur often enough to incorporate into the ABL? I’d say… maybe. It could be added by requiring an extra roll in trying to get the jump to steal third. For example, after the defense is given a chance to set, the offense states that he wants to steal third immediately. If the extra roll allows it, he can try for the jump in the normal way.

So, what should that extra roll be? Let’s assume that the runner would get a normal jump one-third of the time. If the runner always tried to steal third immediately, that would indicate that a 1-in-20 extra roll would produce an attempt once every 60 opportunities, which would reproduce the MLB stats. However, managers will not always try this risky sequence. How often would they try if it were allowed? Stealing seems very lopsided in the ABL (a few players steal all the time, everyone else never steals), so I’ll say 50%. So, if the extra roll requires a zero be rolled with a ten-sided die in order to try for the jump, the percentage of attempts will be: 1/2 manager choice * 1/10 extra roll * 1/3 gets the jump = 1/60, which would reproduce the MLB stat.

Average Errors per Game

The SOM basic fielding chart seems to produce a lot of errors, at least compared to TPB. Reality check: what’s the average number of errors per game in MLB? A quick Retrosheet hack gives the average over the years. It’s not a perfect count—multiple errors during one play are all counted as one.

mlb_errors.png

Is the drop due to a change in fielding prowess or a change in official scoring? I reckon it’s the latter.

Deep Engine 2

More data from the Deep Engine. All results are based on ten million trials.

Here’s the results for all 30 parks from the TPB 2007 data:

     power:     5        4        3        2        1

   homerun:   48.55%   32.38%   19.18%    9.26%    3.11%
    caught:   47.55%   63.72%   76.92%   86.84%   92.99%
      foul:    3.90%    3.90%    3.90%    3.89%    3.90%

As expected, no significant changes from 2006.

I re-ran with the 12 2008 ABL parks, using the TPB 2007 data.

     power:     5        4        3        2        1

   homerun:   40.45%   24.74%   12.86%    5.28%    1.54%
    caught:   55.65%   71.36%   83.24%   90.82%   94.56%
      foul:    3.90%    3.90%    3.90%    3.89%    3.90%

Wow, there are some big parks in the 2008 ABL! It’s much harder to homer, especially for the light hitters who will find it almost twice as hard to hit them out in the ABL compared to the 30-park circuit.

Now let’s see how the numbers look for the different hitting types. Again, this is ten million trials in the 2008 ABL parks.

Rsp
     power:     5        4        3        2        1
   homerun:   39.79%   24.05%   12.36%    5.01%    1.44%
    caught:   57.61%   73.34%   85.03%   92.38%   95.95%
      foul:    2.60%    2.61%    2.61%    2.60%    2.60%


Lsp
     power:     5        4        3        2        1
   homerun:   39.96%   24.20%   12.48%    5.02%    1.46%
    caught:   57.44%   73.19%   84.91%   92.38%   95.94%
      foul:    2.60%    2.60%    2.61%    2.59%    2.60%


Rp
     power:     5        4        3        2        1
   homerun:   40.83%   25.34%   13.35%    5.72%    1.68%
    caught:   53.98%   69.47%   81.46%   89.09%   93.11%
      foul:    5.19%    5.19%    5.19%    5.19%    5.21%


Lp
     power:     5        4        3        2        1
   homerun:   41.23%   25.34%   13.21%    5.37%    1.58%
    caught:   53.55%   69.47%   81.58%   89.43%   93.23%
      foul:    5.21%    5.19%    5.21%    5.20%    5.19%

Not surprisingly, the pull hitters end up with more foul balls. In spite of that, they still end up with a greater probability of homering.

This data can be combined with the batter’s power & the average deeps to estimate the number of home runs a batter will get with Deep! rolls against an average pitcher. Actually, the difference in home-run potential is so similar among the hitting types, that it’s not worth making a distinction. So, for example, a power-5 hitter will homer on about 40% of his 18.7 deep rolls against the average pitcher, effectively giving him an additional 7.5 home-run range.

Combine this with the power distribution, and the probability of a home run on a Deep roll works out to 20.4%. That’s an important number for rating individual pitchers against the average batter.

average power

To assess deep ranges I need stats on the power ratings. The breakdown for the 212 franchise players:

power  #   pct
-----  --  ---
  5    60  28%
  4    50  24%
  3    33  16%
  2    33  16%
  1    36  17%

The average is 3.31 (3.29 vs L, 3.32 vs R). These numbers are likely inflated, as the franchise hitters surely have more power compared to the entire field, but the franchise players will get most of the PAs.

L/R averages

Another vital parameter: How often does a batter face a righty/lefty pitcher? From the 2007 ABL stats: 78.3% of innings pitched were by right-handers, 21.7% of the IPs were from lefties. I had guessed it would have been about a third lefties.

Another way to figure this is to figure the total splits for 2007 PAs at Baseball Reference. (NL, AL) Can figure the batting sides while we’re at it.

       TOTAL     NL       AL
       -----    -----    -----
RHP    72.6%    71.8%    73.4%
LHP    27.4%    28.2%    26.6%

RHB    58.9%    60.6%    56.9%
LHB    41.1%    39.4%    43.1%

Range Factor and range ratings

The Bill James Handbook lists Range Factor, which is the number of Successful Chances (Putouts plus Assists) times nine divided by the number of Defensive Innings Played. Does this statistic correlate with the TPB range ratings? I picked a couple of the more important defensive positions and compared the 2007 Range Factors for starters with the TPB range ratings from the 2007 TPB Statistics Book. Graphs for shortstops and center fielders are below.

Shortstops show a bit of correlation. It’s no surprise to me that Furcal & Vizquel are highly rated by both measurements. I’m surprised to see that Reyes has such a low Range Factor.

Center fielders are all over the place. Vernon Wells has a Superior TPB rating and the lowest Range Factor!

The red lines are the linear fits to the data. The graphs assume that the TPB ratings are linear, that is, that the difference between VG & SP is the same as between PR & WK. Whether or not that’s the intent, it’s clear that there’s no strong correlation between the Range Factor and the TPB range rating. That could mean that either 1) the two measurements are meant for different purposes, or 2) one or both of the measurements are inaccurate.

I don’t think #1 is likely. Surely each is trying to quantify the ability of a fielder to field balls that are hit in his general direction. Of course, measuring any kind of defensive ability is difficult. (See this discussion of various methods.) Whatever the case, it’s clear that the TPB ratings are not based on Range Factor.

Team selection for 2008 ABL expansion

At the ABL Winter Meetings on the 16th I needed to select a team to bring into the league. Going in, I figured it would be between Boston & Philly. Here’s how I decided.

Fifteen players can be selected, so I made up lists of nine position players (eight from each non-pitching position plus a “DH”), three starters, and three relievers. Minimum requirements for the ABL are 175 ABs (125 for catchers), 70 IPs for starters, and 30 IP for relievers.

I used OPS+ to rate the hitters. For pitchers I started with WHIP, then added some arbitrary factors to get something that roughly mirrors OPS+, that is, higher is better and average is 100. I came up with (2.25-WHIP)*110. (It’s listed as “WHIP+” in the tables below.) I weighted starters twice as much as relievers, based on innings pitched (six-inning starts). I then averaged the position players and pitchers separately to get team hitting and pitching ratings. The lists for Philly & Boston are shown below.

philly_boston.png

Of course, you choose the wrong guys here, and the method suffers. Notably absent are Beckett & Schilling, who are already in the ABL. Philly has no one in the ABL.

I did not consider fielding.

In addition to Boston & Philly, I ran the numbers for a few other teams to see how they compared. The graph below shows the results.

expansion_graph.png

So Philly & Boston come out equal. That was no help at all!

So the decision was down to hitting vs. pitching. I chose hitting. Also, Philly looks to have a very balanced lineup, and Utley & Rollins are a superior middle infield. The biggest temtations of Boston were Papelbon & Okajima, both of whom have sick numbers. Big Papi & Ramirez are pretty good too!

Go Phillies! Go Perfectos!

Krazy striKeout Krap

From a Newsweek article about an upcoming jounal article by a couple of psychologists:

If the preference for people, places and things that share one of your initials is conscious, then it shouldn’t work if the thing you’re choosing is basically undesirable. Strikeouts are undesirable. Yet based on data from 1913 through 2006, for the 6,397 players with at least 100 plate appearances, “batters whose names began with K struck out at a higher rate (in 18.8% of their plate appearances) than the remaining batters (17.2%),” the researchers find. The reason, they suggest, is that players whose first or last name starts with K like their initial so much that “even Karl ‘Koley’ Kolseth would find a strikeout aversive, but he might find it a little less aversive than players who do not share his initials, and therefore he might avoid striking out less enthusiastically.” Granted, 18.8% vs. 17.2% is not a huge difference, but it was statistically significant—that is, not likely to be due to chance.

Hmmm…

This comment on the Newsweek site sums up my feelings:

If you survey enough sets of numbers you get random patterns that don’t always come out even. For example, the information about batters is really incomplete. It isn’t sufficient to just compare K against everybody else. Look at the entire alphabet and you will probably see variations of one or two percent between the letters. They won’t all come out the same. Does Q do better than F? Does C do better than M? If so, what does that really mean? Why aren’t they comparing S for strikeout? And what about the name of the pitcher? If it counts for the batter, why doesn’t it also count for the pitcher? This is not a question that can even be really addressed in isolation from everything else.

single with man on first (part 3)

This time I looked at how often runners on first were thrown out at third on a single. Again, no other baserunners. Not counted:

  • Error on the play at third. (Errors allowing the batter to advance past first are OK.)
  • Runner on first out at second or home.
  • Runner safe at home.
  • Baserunner hit by batted ball.

The percentage I’m interested in is the number of times the runner is out at third divided by the number of times the runner is out or safe at third. Again, I counted all the years available in the Retrosheet event files. Graph below. The line is the least-mean-squares linear fit.

single_third_thrownout1.png

I’m surprised how seldom the runner is out at third. There’s a clear downward trend, which indicates that runners and/or third base coaches have become more conservative. Perhaps stronger arms in the outfield are also a factor.

The whole exercise makes me question the role of TPB’s “sending runners” in these situations. (TPB out of the box, not ABL rules.) Why is this a manager’s decision? Runners will try for the extra base on their own, or take guidance from the third base coach. Would it be more realistic to roll for an advancement that is explicitly specified on a chart? Such a chart should be roughly:

  • 26%: runner safe at third
  •  2%: runner out at third
  • 72%: runner holds at second

Then you could sprinkle in some potential errors & such. Of course, there would be a dependency on where the single was hit (as there is now).

single with man on first (part 2)

Previously I looked at the frequency of runners advancing from first to third on a single during the 2006 & 1973 seasons. I extended the analysis by looking at all seasons available from Retrosheet (1957-1998, 2002-2006). A slight change: now I’m counting plays during which the batter went to second on the throw (error or not).

OK, so the chart below shows the percentage of times the runner on first reaches third successfully as a result of:

  • runner on first,
  • no other baserunners,
  • the batter singles, and
  • neither runner put out.

single_first.png

There’s a big spike during the late sixties, then it’s pretty constant from 1970-1995. Since 1995 there’s been a steady decline. Strange!

Next: When runners try for third on a single, how often are they thrown out?

single with man on first

The ABL simplifies runner advancement on singles. I think the only way to go from first to third on a single is on a hit-and-run. This made me wonder about how often runners advance past second on a single. Here’s what I got from the Retrosheet event files for 2006. (single_first.pl) These are singles with a man on first and no other base runners. Advancement on fielding errors counts, but getting thrown out doesn’t.

first to second    4101   (73.5%)
first to third     1473   (26.4%)
first to home         8   ( 0.1%)

About a one-in-four chance to move the man to third. That sounds about right.

Here are the numbers from 1973:

first to second    3270   (68.9%)
first to third     1468   (30.9%)
first to home        10   ( 0.2%)

Why did more guys go from first to third back then?

home-field advantage

One thing caught my eye in the Curve Ball book: batters playing at home hit 12 points better than on the road. Makes sense, but it’s almost as big as the lefty/righty match-up difference, which they say is 15 points. And yet, AFAIK, there are no adjustments in TPB to take home-field advantage into account.

A quick run of the Retrosheet game logs proves the home-field advantage for wins & losses:

              HOME WINS       ROAD WINS
            --------------  --------------
1960-1969    8603 (54.03%)   7319 (45.97%)
1970-1979   10644 (53.78%)   9149 (46.22%)
1980-1989   10995 (54.12%)   9320 (45.88%)
1990-1999   11554 (53.52%)  10033 (46.48%)
2000-2006    9166 (53.93%)   7831 (46.07%)

1960-2006   50962 (53.86%)  43652 (46.14%)

It’s almost 8 points. Not as large as 12, but, of course, there’s more to winning than hitting!

2007-10-07: The Commish’s comment re capturing home-field advantage in Park Effects is very interesting. I might even replace the LHB/RHB categories with home/visitor.