A Library for the Sports Fan Tuesday, Dec 2 2008 

Sports books have entertained and inspired me to develop my own statistics. Here are ten I have especially enjoyed.

10. Baseball Dynasties by Rob Neyer and Eddie Epstein

This book uses both subjective and statistical analysis to analyze fifteen 20th century teams selected as the greatest of all time. Each chapter uses team statistics, an overview of the lineup, pitching staff, and ballpark, and several essays on the team to describe a season one of these dynasties had. There are also chapters on the greatest teams in the Negro Leagues, the worst teams ever, the 19th century, and the great teams that narrowly missed a spot in the top fifteen.

Standard deviation scores provide the main method used for rating the teams. These scores rate teams’ individual seasons based on comparison to the mean in runs and runs scored. The appendix includes the best and worst one hundred SD scores for five different period lengths, providing a valuable resource for research on great teams. Finally, the book provides a nice sense of the history of baseball and how the game has changed from the 1906 Chicago Cubs to the 1998 New York Yankees.

9. The NFL Record and Fact Book

This annual guide is the definitive source of information about a single season in the NFL. This encyclopedia contains individual statistics for every team, the score of every game in NFL history organized by team matchup, and box scores of every Super Bowl and Pro Bowl in NFL history. Of course, the official record book is included as well as a section of pure statistics. The best part is the game wraps of every game from the most recent season, a great way to review the ups and downs of the season.

8. The Hidden Game of Football by John Thorn and Pete Palmer

I chose this book over The Hidden Game of Baseball, by the same authors, because it is more original. Even though it was published in 1986, this first-ever detailed look into football statistics is still innovative. The book has new ways to rate quarterbacks, running backs, and even the unrecognized offensive line. The insights into strategy are even more astounding: fumbles hurt a team just as much at midfield as at the opposing goal line, coaches should go for the touchdown more often on fourth-and-goal, and teams often misuse punting. By rating NFL players by all-star selections, this book exposes the mistakes of football’s most hallowed institution, the Hall of Fame. The new Win Probability method applies to football the methods that the Mills brothers used for baseball. This book will change the way you think about football.

7. Clearing the Bases by Allen Barra

A contrarian look at the greatest debates in football and baseball history. Barra picks Mantle over Mays, Clemens over Koufax, even bashes Babe Ruth and Don Shula. But all of his conclusions are backed up with solid statistics, making this a fine introduction to using numbers in sports. Barra is a fantastic writer and is adept in using statistics in multiple sports. You will disagree with some of his conclusions, but Barra is so thorough it is hard to disagree with his arguments.

6. Baseball Prospectus and Football Prospectus

Many of the best statisticians work for these annuals, which are often the source of groundbreaking research. Both contain innovative essays and a top prospects list.

BP contains analysis of every major league player and many minor league players. Each player has his last three seasons and the prediction for next season listed with both traditional statistics and statistics developed by BP statisticians. Its prediction algorithm, PECOTA, analyzes the performance of up to one hundred players who were comparable to the player in question at the relative age, and generates a predicted season as well as the probabilities that the player will improve, collapse, or have a breakout season. There is also a paragraph of often hilarious text analysis.

FP contains somewhat less analysis, but is the best source for defensive football statistics anywhere. FP writers watch every single NFL game to chart detailed statistics for otherwise unofficial categories like passes knocked incomplete, tackles, and passes defensed. As a result, their analyses and statistics are unsurpassed even by the NFL Record and Fact Book itself.

5. The Numbers Game by Alan Schwartz

This is not a book of statistics; rather it is a book about statistics. Alan Schwartz traces the history of baseball statistics from Alexander Cartwright, through Bill James and Earnshaw Cook, to the 21st century and Retrosheet. The book gives a great sense of the evolution of numbers, from the early days in which statistics were sent over the telegraph to the computerized systems employed today.

4. The ESPN Baseball Encyclopedia

This worthy successor to Total Baseball contains the statistics of every player to ever play in the major leagues. It also has box scores of all playoff games and all-star games. For sheer mass of statistics, the best book on this list.

3. The American Racing Manual

The ARM is published annually by Daily Racing Form and is the definitive horse racing statistics book. Containing the results of every stakes race and the statistics of every stakes winner, its 2000+ pages include information on every horse to race in the previous year as well as information on the racetracks and great horses in North America and the world. The past performances of the twenty-nine selected champions are especially invaluable, and to look through the pages of graded stakes results since they began is like looking at a condensed version of racing history. There is also a glossary of horse-related terms, and each edition contains a sample chapter from a newly published handicapping book.

2. The ESPN Pro Football Encyclopedia

I mentioned earlier that the ESPN Baseball Encyclopedia was a worthy successor to Total Baseball. This book betters Total Football by the length of Jack Tatum’s 108-yard fumble return. In addition to the standard reference of every pro football player’s statistics, there are box scores of every playoff game and Super Bowl in history with individual statistics and summaries. But the standout element is the box scores of every regular season game in NFL history, including team statistics, quarter-by-quarter scoring breakdowns, and any notable performances like 100 yards rushing or 300 yards passing. Want to see how California teams have done in the fourth quarter against the Vikings? It’s in the book. There are even individual field-goal distance breakdowns by season and rosters for returns, sacks, and interceptions.

1. The New Bill James Historical Baseball Abstract by Bill James

This book does not have as many statistics as the author’s name might suggest, but it provides a brilliant and comprehensive look at baseball history. The first part, “The Game,” has a chapter for each decade and describes the changes and momentous events that occurred during that decade. Information ranges from the serious (attendance statistics and projected MVP and Cy Young awards) to the arcane (Handsomest Player of the Decade.) The second part is called “The Players,” and contains James’s top 100 players at each position. His method involves his revolutionary statistic Win Shares, and the rating system is detailed along with articles on his opinion of clutch hitting and fielding. The top 100 has an essay on each player, ranging in length from a few quotes from contemporary sources to almost ten pages long. This section is filled with anecdotes and new statistics and is easy to browse through. No other book has as much baseball history.

Who Cares What the Cardinal’s Winning Percentage is on Cinco de Mayo?: The Most Common Sports Statistics Mistakes and How to Avoid Them Monday, Aug 4 2008 

February 2007

Sports statistics are helpful tools for determining the relative ability of players, teams, or leagues. Indeed, there are so many different stats players can be evaluated many different ways. But incorrect conclusions are often drawn, and this column discusses three of the most common fallacies made.

1. Forgetting to Differentiate between Eras.

This is not a very common mistake, because sports performance has not changed much in many sports over time and computers have made adjusting a lot easier. But people still sometimes forget about differences. Your knowledgeable but mistaken baseball fan might say, “of course, today’s baseball is a lot like 1930’s baseball, with lots of home runs and little stealing”. In fact, compare home runs for the 1935 and 2000 National Leagues: Even accounting for the fact that 2000 had twice as many teams, 150% more home runs were hit in 2000! Runs scored were not much different, but strikeouts and walks were both much more frequent in 2000.

Even when comparing contemporaries, mistakes can be made by not adjusting. In 2000, ERAS were .29 higher in the AL than in the NL, because the DH has helped hitting and thus hurt pitching. Even before the DH, in 1963 about 410 more runs were scored in the AL than the NL.

2. Using Misleading Statistics.

This is a very tricky mistake, because it’s not really possible to know whether a statistic is bad or not. But by now, we have a pretty good idea of which statistics are misleading or just plain awful in most sports. A list of some of the most misleading stats:

Baseball: RBI, runs, batting average, wins, losses, winning percentage

Football: Sacks, interceptions, touchdowns

Basketball: Overall points scored, many volume stats

Golf: Holes in one, scores on a back or front nine without course analysis

Horse racing: Money won, breeding (after a horse has already shown its ability)

The main theme that should show in these numbers is volume statistics. Although rate statistics don’t adjust for era, they do adjust for playing time, can be adjusted more easily, and the numerator is less likely to be a misleading volume statistic.

But one big rate statistic pops out here: batting average. This is the most commonly seen baseball statistic, and the most commonly used for evaluating hitters, but it has lots of weaknesses. It includes sacrifice hits in at-bats, a bad mistake when bunting is used so commonly. But much more important, it does not include walks or hit-by-pitch which are not much less valuable than a walk and certainly not less than half as valuable, as some old time books say.

On-base percentage takes care of these two problems, and slugging average neglects them but weighs different hits more accurately. For an accurate picture and no time, calculate slugging times on-base or 2×2 rate distribution. If you have forever (or the right book) look up or calculate Bill James’s win shares, an extremely accurate way to measure single-season value for both hitters and pitchers.

Sometimes it’s easy to tell if a statistic is wrong or just silly. Who cares what the Cardinal’s winning percentage is on Cinco de Mayo? Who cares whether Jack Nicklaus has more eagles on hole 3 or hole 14? Who cares whether the 49ers have more touchdown passes when the moon is full or new? These things just don’t matter! Plenty of things have no effect on the outcome of something; just because an opera singer missed a high C doesn’t mean that Mark McGwire will eventually be elected to the Hall of Fame.

3. Concentrating on One Statistic or Time Period

This is one of the easier mistakes to identify, but it’s hard to know exactly what evaluations of this kind are fallacies. All statistics have their advantages, and even some of the above, if not helpful for evaluations, are good for a laugh. But sometimes neglecting the whole can be fatal to an argument, and it’s important to remember that no statistic is unimportant enough to not be included in your argument.

With time periods, a similar mistake can be made by only considering part of a player’s career. Leaving aside the fact that he took steroids, Ken Caminiti simply cannot be judged by his stellar 1996 season, because it is so out of form with the rest of his career. Using Total Baseball’s TPR, he is:

1994: 1.6

‘5: 4.0

‘6: 7.3, 54th best season of all time!

‘7: 4.8

‘8: 1.2

Although he had two reasonably good seasons flanking ‘96, it is just not fair to evaluate him by ‘96. One season does not make a great player. Five may in baseball, three may in football. But one great season is just that, one great season.

In evaluating sports performance using statistics mistakes are commonly made. If you learn to recognize and avoid these mistakes, it will help you to fairly judge the performance and ability of players, teams and seasons.

Statistical Benchmarks: Misleading Marks of So-Called Excellence Monday, Aug 4 2008 

April 2006

Statistical benchmarks are one of the most controversial subjects in the world of sports at present. Benchmarks are often cited in cases for and against various competitors, yet often benchmarks can be vulnerable to the problem of playing time since they are most often volume statistics (see the February issue for an explanation of this).

Volume statistics are good as benchmarks because they aren’t subject to volume’s typical problem of playing time. Since benchmarks are usually for a game or season, only injuries prevent players from having similar amounts of playing time.

Benchmarks can be divided into three major categories: game, season, and career. Game benchmarks are generally freaky occurrences that take major headlines but have no long-term effect. Examples might be 250 yards rushing, 4 home runs, 3 or more goals, or to take a recent example, Kobe Bryant’s 81 points.

Season benchmarks definitely change popular views and opinions, but don’t attract the sports’ spotlight as much. 4,000 yards passing, 500 rebounds, 150 RBI, or 60 home runs demonstrate this point. The most influential of all are career marks, like 500 home runs, 15,000 points, 10,000 yards rushing, or $1,000,000 won. This last mark is from horse racing, which, if it is not already so, certainly has the potential to be statistically superior to a great majority of sports.

Of course, these benchmarks aren’t always sound. A few historical examples soon show this. Take for example Pete Rose topping Ty Cobb’s career record for hits. A table of their hits:

Player Hits Seasons Batting Avg.
Cobb 4189 24 .366
Rose 4256 24 .303

Doesn’t Rose look superior? He has more hits than Cobb in the same number of seasons in a much harder era for hitters. But look at the batting averages! Cobb earned his hits in 11,445 at-bats, for a batting average of .366, which happens to be an all-time record. Rose got his in 14,053, for .303, which is excellent. However, it is sixty-three points lower and thus Cobb is vastly superior in this area of offensive.

Now let’s look at football; Eric Dickerson holds the all time record of 2,105 yards rushing in a season. Let’s compare it to O.J. Simpson’s 2,003, the standard before Dickerson broke it:

Running back Yards Games Yards/Game
Simpson 2003 14 143
Dickerson 2105 16 131

Despite having 102 yards more rushing than O.J., Dickerson played in two more games and thus had fewer yards per game. If Simpson had played a sixteen game season, he would have had 2,289 yards, almost 200 more.

Finally, let’s compare Twilight Tear to Ruffian, two fillies that dominated the Sport of Kings in 1943-1945 and 1974 and ‘75, respectively. Looking at money won, Ruffian is vastly superior, $313,429 to Twilight Tear’s $202,165. But Twilight Tear raced 30 years earlier, when races had lower purses (She raced for $79,000 at most, to Ruffian’s $350,000).  A chart of their races:

Horse Races 1st 2nd 3rd $’S Won Years
Ruffian- Non Fillies and Mares 1 0 0 0 0 1
Twilight Tear- Non Fillies and Mares 10 7 0 1 N/A 2
Ruffian-Overall 11 10 0 0 313,429 2
Twilight Tear-Overall 24 18 2 2 202,165 3

However, Twilight Tear did not only run in races restricted to fillies and mares, as Ruffian did. The lone exception was Ruffian’s flukish, and as it turned out, tragic, match race with Foolish Pleasure. Twilight Tear won the Pimlico Special as well as the Skokie and Maryland Handicaps in the space of one year which was an excellent performance for any horse. Since her races were often open to more competitors than Ruffian’s, it is easy to conclude that Twilight Tear did better in her time than Ruffian did in hers.

Benchmarks can be a helpful method of evaluating athlete performance to the casual fan with little access to statistics. However, in the long run, they give a distorted picture of the truth.

Statistics: The Ultimate Measurement Monday, Aug 4 2008 

February 2006

Welcome to the world sports statistics, one of today’s most fascinating fields. This new column will compare stats from all types of sports, their effectiveness, and perhaps even the relative popularities that they deserve!

There are two basic types of statistics, volume and rate. Volume statistics are found by raw addition and/or subtraction. Examples are home runs, touchdown passes, goals made, or points made. Rate statistics are usually averages such as completion percentage in football or batting average in baseball, but can be different, as we will see later.

Rate statistics are almost always more effective than volume stats. For example, given the choice, which baseball player would you trade for?

Player At-bats Home runs Slugging percentage
A 568 23 .415
B 115 9 .725

My guess is that you would choose Player B. Of course, they can also be misleading. Do you think a basketball player who made a total of 60% of his shots in ten seasons or a player who made 65% in one is better? Because the first had proved over longer period of time with more injuries and slumps that he was consistent, he would have the edge over the other for those ten seasons. If I had the choice in a trade I would almost certainly choose the second because he would probably be earlier in his career than the former, although I would have chosen the first to be on an All-Decade Team.
The other type of rate statistic attempts to convey all the aspects of a player’s performance. In baseball, the sport in which this type is predominantly found, these statistics are usually stated in terms of runs or wins. These statistics are called sabermetrics, a term coined by Bill James, one of baseballs premier statisticians. These statistics can be very complicated, as one definition, from Total Baseball, demonstrates:

“Total Player Rating The sum of a player’s Adjusted Batting Runs, Fielding Runs, and Base Stealing Runs, minus his positional adjustment, all divided by the Runs Per Win factor for that year (generally around 10, historically in the 9-11 range).”

Note: The majority of the terms listed in the definition above require laborious calculations, and the positional adjustments, listed in a chart, were found by hard labor. Thus it is evident that the complexity is not underrated.

However, volume statistics should not be underrated. In hockey, even with a sad lack of rate stats, MVP awards are based primarily on volume statistics (goals, assists, and so on.) In basketball, points per game are an important method for evaluating game-to-game performance. Most importantly, however, all rate statistics are based on volume statistics, because rate statistics, which are averages, are calculated on volume stats. Both kinds depend on one another, as volume statistics generate rate stats, and rate statistics evaluate volume statistics.

Many different experiments can be used to demonstrate the relative popularity of volume and rate statistics. For example, you may want to go through the sports section of a newspaper and compare the number of times that the two types were mentioned. The absolute best time to do this would probably be September, when the baseball and football seasons are going on at the same time. Or, find how many different statistics you can find. You’ll probably be astounded at the number that are mentioned!

Resources

The #1 statistical reference that I know of is the series published by Total Sports. It includes Total Baseball, Total Football, Total Basketball, and Total Hockey. These comprehensive books include career statistics by season for every player to ever play the sport, all-time records in dozens of categories, seasonal standings, and much more.

Total Baseball:

http://tinyurl.com/7s6uk