Seven Current Stars for Racing’s Pantheon of Greats Monday, Aug 4 2008 

August 2008

Most sports quickly recognize their stars and place them among the all-time greats. Ten years ago, Ken Griffey Jr. was being compared to Willie Mays and baseball statistician Bill James ranked 33-year-old Craig Biggio as the 42nd best player of all time. Ask a basketball fan to list the top players in NBA history and they will include current stars Kobe Bryant and Shaquille O’Neal. Ask a football fan to name the best quarterbacks ever and they will respond with Brett Favre and Peyton Manning. It seems that superstars today are often immediately compared to and ranked among players whose career has passed. But if this is so, why is horse racing, a sport where the quality of racing is higher then it has ever been before, slighting its current stars?

Have you ever heard Fusaichi Pegasus compared to War Admiral, Azeri to Ruffian, War Emblem to Man o’ War? Horse racing fans debate the present or the past, never the combination. Horses that raced fifteen years ago, like Cigar, Holy Bull, and Skip Away, are just now being compared to other stars. Before the recent statistical revolution, comparisons were largely subjective. But now, in an age where it is easier than ever to compare athletes, horse racing has yet to join the trend and recognize contemporary all-time greats. Identifying stars of the present that deserve to be ranked among the best ever, this column will rank the top seven horses of the past decade and compare them to the great stars of the past.

Beyer Speed Figures will be used to rank these current stars. Invented by handicapper Andrew Beyer in the 1970s, the method assigns a number to each horse in a race based on the final time of the race and the number of lengths that horse was behind the winner at the end of the race. Because most horses run more slowly when a track is wet or muddy, Beyer Speed Figures (or “Beyers”) adjust the final times for track conditions by comparing times for races to a par for that track, distance, and purse. Most horses run about an 80. Champions run around 105-115, and a Horse of the Year usually runs a high of about 117. By far the highest Beyer ever run is Secretariat’s 139 in the 1973 Belmont. Only one other horse has even run a 130.

The top seven horses of the past decade and their historical counterparts:

Number 7: Azeri

Azeri has been the best mare to race in the past ten years. She was Horse of the Year in 2002 and dominated her division for three straight years. Her figures are fantastic. She always ran a Beyer over 100 and even ran a 110, 110, and 111 in consecutive races. Although she was unsuccessful both times she raced against males, she came close in the 2004 Breeders’ Cup Classic and her other defeat came at Belmont, a track which she loathed. The quantity of the big races she ran, rather than a few brilliant ones, got her on this list.

Azeri is probably most similar to all-time great Dahlia, the star mare of the 1970s. Both raced primarily against other females but won awards that normally went to males. Though Dahlia raced on turf and Azeri on dirt, their running styles are similar. Their ability is also quite comparable-neither was quite the best horse in the world during her career, and both had some off-races interspersed with the high-quality ones.

Number 6: Curlin

Curlin, the only active horse on this list, made headlines in 2007 by winning the Preakness Stakes in his fifth career start. Curlin went on to win the Breeders’ Cup Classic, Dubai World Cup, Stephen Foster, and Jockey Club Gold Cup, with excellent Beyers. He earned a 119 in the Classic, but his other figures indicate that might have been a fluke, which is why he is only sixth. Curlin still has plenty of room to improve, however. He made his turf debut with a second in the Man o’ War. Watch for him on grass again later this fall.

With his come-from-behind style, Curlin resembles Alydar. Both debuted in a strong three-year-old class and came very close in Triple Crown races. While Curlin won the BC Classic and Alydar didn’t run in it (it didn’t exist in 1978), both had strong 4-year-old campaigns. Alydar is rarely ranked among the top 100 horses ever, but he was overshadowed by Affirmed despite being almost as good. Affirmed is usually ranked among the top twenty horses ever.

Number 5: Bernardini

Three-year-old Bernardini dominated other horses in 2006. He won the Preakness in a race that was marred by Kentucky Derby winner Barbaro’s breakdown coming out of the gate. Yet even if Barbaro had been able to run the race, Bernardini probably would have won anyway. Bernardini ran a 113 in the Preakness, while Barbaro had run a 103 and 111 in his previous two starts. Bernardini won five straight stakes races by a combined 32 lengths, and came within a length of Invasor in the Breeders’ Cup Classic. He won five Grade I stakes (the highest level), a Grade II, and a Grade III. He ran five straight Beyers that topped 113, including a 117. One length more in the Classic and he would be one or two spots higher.

Bernardini’s historical counterpart is Nashua, from the 1950s. Both won the Preakness while overshadowed by the absence of another star of their generation. Bernardini drew criticism because he didn’t defeat Barbaro, while Swaps returned to California after the Derby, leaving the Preakness to Nashua. Despite this, both were better three-year-olds than their rivals.

Number 4: Tiznow

Tiznow was the only horse to win two Breeders’ Cup Classics, with his victories coming in 2000 and 2001. He also compiled an astonishing statistic: in those two years, he never let the leader get more than three lengths in front of him at any point during a race. His Beyers are uniformly excellent, with a 114, 115, 116, two 117s, and a high of 119 in the Goodwood Breeders’ Cup Handicap.

Tiznow’s career bears a startling resemblance to that of Ferdinand, born in 1983. Both were successful three-year-olds and went into the BC Classic in their four-year-old campaigns as the underdog, overshadowed by the three-year-old star of that year. However, Ferdinand vanquished Derby winner Alysheba and Tiznow won the Classic over Arc de Triomphe winner Sakhee. Incidentally, both won in photo finishes.

Number 3: Mineshaft

2003 Horse of the Year Mineshaft didn’t run in the BC Classic but won four Grade I races, a Grade II and Grade III, and was second by a head in the Grade I Stephen Foster after being bumped at the start. He was consistently brilliant, with seven consecutive figures between 114 and 118. His   and dominance of his opponents placed him high on this list.

Mineshaft’s consistency is almost unequaled in racing. The best match is probably Citation, the winner of the 1948 Triple Crown. Though Citation may have been slightly better, he seemed to run the same race over and over again, just like Mineshaft. Citation’s record 16 straight victories and Triple Crown championship reflect to show how good Mineshaft was.

Number 2: Left Bank

By far the most underrated horse of the past decade, Left Bank showed brilliance that few other horses have been able to achieve. Primarily a miler, he didn’t run in any major races but his Beyers show his speed. In 2001, he ran two 118s in stakes races. The next year, he ran two 121s. If he were able to run now, he would have dominated last year’s inaugural Breeders’ Cup Dirt Mile.

Left Bank’s speed is matched by that of the great filly 1960s filly Ruffian. Both were at their best in sprints, though they were able to stretch out to longer distances. Like Left Bank, Ruffian dazzled a relatively weak group of horses in her races.

Number 1: Ghostzapper

Ghostzapper easily ran a 116 in the 2003 Vosburgh then won Horse of the Year in 2004 by virtue of his dominant win in the BC Classic. He is the only horse on this list to run a figure higher than 121 or to have three above 120. His Beyers of 114, 120, 124, and 128 are incredible. He defeated a Horse of the Year, a champion older horse, and many others in an unusually strong Classic field. The 2005 American Racing Manual called him “Simply the best horse to set foot on an American racetrack in 2004, and perhaps for many years.” He is clearly the best horse of the past decade.

Ghostzapper’s long career, ability in both sprints and longer races, and handicap dominance inspire the obvious comparison to Forego. Forego raced in the 1970’s, and though he was overshadowed by that decade’s Triple Crown winners, had his own great moments. Four years he was voted champion older horse. In three of those years, he was named Horse of the Year. Like Ghostzapper, he was equally capable of winning at seven furlongs or a mile and a quarter.

Despite being ignored by racing fans, Azeri, Curlin, Bernardini, Tiznow, Mineshaft, Left Bank, and Ghostzapper deserve places in racing’s list of all-time greats.

Rating the NFL’s Great Teams by Pythagorean Records Monday, Aug 4 2008 

June 2008

More than any other sports teams, great NFL teams are assessed by how many championships they win. The 1969-1971 Baltimore Orioles are considered a fantastic baseball dynasty, but they won just one World Series. The Detroit Pistons have been one of the best teams in the NBA, but perennially come up short in the playoffs. Football’s notion of greatness, winning Super Bowls, is unique but it is also nonsense. The Super Bowl is one game, and the evaluation of a team should not rest on a short sixty minutes of play. Teams should not receive as much credit for beating up on a poor team, as happened in Super Bowl XX, as they do for defeating another great team, like Dallas’s win in VI.

Rating teams by Super Bowls is inaccurate, so another popular way of rating great teams is by their regular season record. But this method also has weaknesses, despite the sixteen-game sample size. A team may kick a game-winning field goal in overtime that would make no difference in a 42-0 rout. A team that many wins games by small margins but loses routs will make the playoffs ahead of a superior team that wins games by huge amounts but loses a few close games in overtime. Thus, the regular season record should be discarded as a method for rating great teams.

Since the two most popular methods for rating football dynasties are both inaccurate, we must turn to a method developed by baseball statistician Bill James. James invented a formula for estimating a team’s record based on points scored and allowed. The system, called the Pythagorean record, is as follows

Expected W = G x P2

P2+PA2

where W is wins, G games, P points, and PA points allowed. By factoring out luck, the Pythagorean record calculates how many games a team should win. The formula also makes an automatic adjustment to sixteen game seasons, which is helpful for analyzing the performance of early teams that played shorter schedules.

What NFL teams qualify as great dynasties? The Chicago Bears of the 30s and 40s were consistently outstanding. They once scored more than 300 points while allowing just eighty-four. The Monsters of the Midway dominated for eleven seasons, and another great team didn’t emerge until 1950. The Cleveland Browns of that era started in an upstart league, the AAFC. Under coach Paul Brown and quarterback Otto Graham, the Browns won the AAFC championship for four years and joined the NFL when the AAFC disbanded. They appeared in six consecutive NFL championship games and won three. Vince Lombardi’s Green Bay Packers are probably the most famous football dynasty ever. They won three straight championships, including the first two Super Bowls.

As soon as their run ended, the Johnny Unitas Baltimore Colts became the NFL’s premier team. Their 258-point differential in 1968 is one of the best ever. They appeared in Super Bowls III and V, and were just seven points from having a better season than division champion Los Angeles in 1967. The New England Patriot’s bid for a perfect season renewed interest in the Miami Dolphins of the 70s. The 1972 Dolphins went 17-0, and according to their Pythagorean record, they did even better the following year. The Dolphins won two consecutive Super Bowls and had perhaps the strongest group of running backs in history. Miami boasted Larry Csonka, Mercury Morris, and Jim Kiick in their backfield. Another great dynasty started their run in 1971. The Dallas Cowboys reached four Super Bowls and won two. Their streak of winning seasons lasted until 1983, making the thirteen-season period the longest run of any team in this study. The Cowboys’ nemesis was the Pittsburgh Steelers. The Steelers won four Super Bowls in six years and had a 200-point scoring differential for two straight years. Their defense, the Steel Curtain, is often considered the best in NFL history.

The Joe Montana San Francisco 49ers tied the Steelers by winning four Super Bowls in one decade. If not for the strikes in 1982 and 1987, the 49ers would very likely have won another and become the only NFL dynasty to win five championships. The 49ers had 100-point differentials for ten of eleven seasons, and topped 150 in six of those seasons. After 1992, Montana went to Kansas City and Steve Young took over at quarterback. With Young, Jerry Rice, and running back Ricky Watters, the 49ers won Super Bowl XXIX and their division for three straight years. The Buffalo Bills of the early 90s are the only team to reach four consecutive Super Bowls. Quarterback Jim Kelly and running back Thurman Thomas were the stars of the revolutionary Hurry-Up offense. In Super Bowls XXVII and XXVIII, the Bills lost to the Dallas Cowboys. The Cowboys had a short run but dominated the NFL for four years.

Some teams not in the study include the Bears, Redskins, and Broncos of the 80s. Washington and Chicago were impressive in the years they won championships, but otherwise nothing special. The Broncos were more consistent, but topped the 100-point scoring differential barrier only twice and had several mediocre years. The Minnesota Vikings were unbelievable in 1998, scoring more than 550 points and allowing less than 300. However, they were great for just one season. The New England Patriots have won three Super Bowls in four years and went 16-0 in the regular season last year. However, their scoring differentials are mediocre and their videotaping of opponent’s signals casts suspicion upon their record.

After narrowing the field to eleven dynasties, I applied the Pythagorean Formula directly. The results were skewed towards early teams, as eight of the ten best seasons were from the early Chicago Bears. Modern teams like the Cowboys or Bills were at the bottom of the group. Because the margin of scoring in games has continuously decreased throughout history, so have seasonal margins between the best and worst teams. Early teams should not have this advantage and perhaps should be penalized because of the differences between early run-oriented football and the balanced modern version. The Monsters of the Midway would have much more trouble defending against a wide receiver like Randy Moss than would a defense of similar ability today. After making the adjustments, teams of all eras were evenly distributed throughout the list of teams. The adjustment reduced the wins of every team, so I added 2.82 wins to every team. These adjustments gave the leading team, the 1976 Steelers, exactly sixteen wins.

To rank dynasties, I used four measurements: average wins in best three consecutive years, average wins in best five consecutive years, average wins per year in the period considered, and the total wins in the period. I adjusted the latter component so that it was comparable with the other three, i.e. it was on sixteen-win scale. I weighted the five-year run most heavily, because it represented both the short-term ability and the consistency of the team. The next most important factor was the three-year run, measuring the peak of the team. The average and total were rated equally.

The final computation showed the greatest NFL dynasty ever was the 1980s 49ers. They finished in the top four in every rating component. In addition, the team led in average and total, an incredible feat considering that the two have a very strong negative correlation. Great teams tend to either have short, brilliant runs or long, consistently very good ones. The 49ers had the best aspects of both. In second were the Miami Dolphins. They were one of only two teams to have three consecutive seasons of 14 wins or better. The Pittsburgh Steelers had the best scores for 3 and 5 years, but did poorly in the average category. They were almost tied with the Dallas Cowboys of the 70s, who had the best total of any team. Surprisingly, the Vince Lombardi Packers finished last. The team had fantastic actual records but poor scoring differentials. They were the opposite of the 49ers, finishing in the bottom four of every category. The complete results of the study:

Dynasty

Start

End

Best 3

Best 5

Avg.

Total

Adj. Tot.

Rating

San Francisco M

1981

1992

14.55

14.66

14.33

143.31

15.58

14.751

Miami

1970

1975

15.05

14.43

14.02

84.16

12.78

14.173

Pittsburgh

1972

1979

15.12

14.78

12.15

97.21

13.47

14.077

Dallas

1971

1983

14.02

13.42

13.15

157.89

16.18

14.068

San Francisco Y

1993

1998

14.63

14.29

14.19

85.18

12.84

14.065

Baltimore

1966

1971

14.22

13.67

13.42

80.54

12.58

13.539

Dallas

1992

1996

14.52

13.42

13.42

67.08

11.80

13.371

Buffalo

1988

1993

13.64

13.37

13.33

79.98

12.55

13.266

Chicago

1934

1944

13.42

12.32

11.72

129.00

14.97

13.005

Cleveland

1950

1955

13.25

13.00

12.88

77.29

12.40

12.919

Green Bay

1963

1967

13.34

13.16

13.15

65.78

11.72

12.917

The best dynasty in NFL history is clearly the 49ers. The Pythagorean formula establishes the strengths of the 49ers: consistency, high peaks, and durability. Pythagorean records are a powerful way to analyze sports performance. By removing factors of luck that bias championships or regular-season records, they analyze how a team really should have done. Thus, Pythagorean records expose overachievers and show which teams deserve to win based on their points scored and allowed.

The Kentucky Derby: The Least Exciting Two Minutes in Sports? Monday, Aug 4 2008 

April 2008

The Kentucky Derby claims to be the most important race in the world for assessing the abilities of three-year-old horses. But in recent years, it has done a dubious job of selecting champions. Last May, Street Sense won the Derby, but lost to Curlin in the Preakness and Breeders’ Cup Classic. The 2005 winner, Giacomo, was a 50-1 longshot who won little else that year. Funny Cide wore the roses in 2003, but was underwhelming as a four- and five-year-old. All these Derby champions had something else in common besides overachieving in the Derby. They all came off the pace after the leaders had run fast early fractions. Looking over the past fifteen Derbies, an unusual number of winners have rallied from behind. Only two horses led from start to finish. Most of the other early leaders faded to finish in the bottom half of the pack. Unique factors in the Derby, such as the size of the field, contribute to these closing wins and the number of victorious, yet mediocre, horses that win the race.

Closers can take advantage of fast early fractions by the pacesetters in a race. After running a quick opening half-mile, it is virtually impossible to run the rest of the race at the same pace. But come-from-behind horses can run at a constant speed or even speed up in the homestretch. In 1973, for instance, Secretariat ran each quarter of the Derby faster than the previous one! This is analogous to the difference between running a hundred-meter sprint and a one-mile race. If a track star could run at the world record for the sprint throughout the mile, they would set a new world record by a minute-and-a-half. Obviously, it is impossible to run at a very fast speed for a long distance.

Quick early fractions allow closers to swoop by the early leaders and win the race. This phenomenon has been especially pronounced in the Derby. The chart below contains the fractions of the previous fifteen Derbies (excepting the 1999 running, for which data is not available). The row labeled “Win” gives the type of victory. “On lead” is a victory from start to finish, like War Pass’s 2007 Juvenile victory. “Pounce” indicates a victory from three or four lengths back. El Gato Malo’s San Rafael typifies this type of win. Finally, Pyro’s Risen Star is a good example of a “Close”. A championship race should have equal numbers of each of these types of wins, but look at the numbers in the Derby:

1993

1994

1995

1996

1997

1998

2000

¼

22 4/5 22 4/5 22 2/5 22 1/5 23 2/5 22 4/5 22 3/5

½

46 3/5 47 1/5 45 4/5 46 47 2/5 45 4/5 45 4/5

¾

1:11 1/5 1:11 4/5 1:10 1/5 1:10 1:12 2/5 1:10 3/5 1:09 4/5

Win

Close On lead Pounce Close On lead Pounce Close

2001

2002

2003

2004

2005

2006

2007

¼

N/A 23 1/5 22 4/5 22 4/5 22 1/5 22 3/5 22 4/5

½

44 4/5 47 46 1/5 46 3/5 45 1/5 46 46 1/5

¾

1:09 1/5 1:11 2/5 1:10 2/5 1:11 4/5 1:09 2/5 1:10 4/5 1:11 1/5

Win

Close On lead Pounce Pounce Close Pounce Close

Note that only three of these fourteen Derbies, or twenty-one percent, were won wire-to-wire. One of these, the 1994 running, was in the mud, a track condition that favors front-runners. Even more incredibly, the three slowest times for each ¼, ½, and ¾ mile split correspond to the three “on lead” victories. Two Derbies in particular stand out from this chart.

In 2001, Monarchos came from far back to win in 1:59 4/5, the second fastest Derby ever. He benefited from the fastest ½ and ¾ fractions in Derby history. The final time belied his true ability. Four years later, 50-1 longshot Giacomo won, rallying from eighteenth. Pacesetter Spanish Chestnut set the fastest ¼ and second-fastest ½ and ¾ during this period, and some of the fastest ever. He finished third in the Preakness and seventh in the Belmont. The Derby was a fluke, as he could not have rallied without the blistering early fractions.

It is now clear that poor horses win the Derby because of its fast pace. But why is the Derby especially subject to this phenomenon? One explanation is obvious. Up to twenty horses are allowed to start in the Derby. If there are so many horses in the field, there must be more speed horses. These speedsters all fight for the lead, and the pacesetters are forced into deadly early fractions.

Another contribution to a fast pace is the effect of “rabbits”. Rabbits are horses entered to force a fast pace and allow a closer stablemate to capitalize on the fast fractions. A famous example is the 1963 Woodward Stakes, in which Dr. Fager faced Buckpasser and Damascus. The trainers of both Damascus and Buckpasser entered rabbits to tire the speedy Dr. Fager. Indeed, the Doc ran the opening six furlongs in a scintillating 1:09 1/5. Damascus blew past him to win by ten lengths. Dr. Fager was superior to his opponents, but the rabbits were able to keep him from winning. Rabbits are a common tactic in the Derby and contribute to the pace.

Pressure is often a decisive factor to front-runners. If horses are breathing down the leader’s neck, pacesetters must sprint the first fractions so as not to lose the lead. On the other hand, if no horses are close to them, they can slow down since they can keep the lead but not tire themselves. There is often a great deal of pressure in the Derby from other horses, because of the size of the field. In 2004 and 2007, the leaders finished second, but both set average-to-slow paces (for the Derby) and had two lengths on their pursuers in the backstretch. Their finishes were due to the amount of pressure from other horses. Normally, it is impossible for a horse to take the lead in the Derby without others close to him.

Finally, let’s consider what speed horses on the inside or outside must do. Horses with the inside post must either gun to the lead, fall far back and rally, or get stuck in the middle of a sea of horses. The latter is not an option, and it is usually fatal to go against a horse’s running style, so pacesetters on the inside typically run straight to the turn. If several horses like this are in inside positions, they can create a deadly pace and run each other into the ground. Similarly, horses on the outside risk going wide on the first turn if they do not drop back or run for the lead. Front-runners must angle to the turn and expend a great deal of energy to reach the lead. What appears as a 21 4/5 fraction from an outside horse may actually be equivalent to a 21 2/5. Therefore, horses from the outside fatigue quickly and allow closers to go by in the homestretch much more easily.

The Derby is intended to decide the three-year-old championship in one race. But if it eliminates the third of horses that are front-runners, it cannot achieve its goal. Closers win the derby at an astoundingly high rate, and skew the results of the race. The number of horses in the gate is the most important factor in determining the pace, and thus what horse wins the race. If the Derby reduced the number of starters to fourteen, like the Breeders’ Cup, the race would be fairer and less predictable. For now, however, the Derby remains a flawed and biased test of three-year-olds’ ability.

For videos of 2008 Triple Crown prep races, see http://www.kentuckyderby.com/2008/videos

See previous Derbies at http://www.youtube.com/profile_videos?user=kentuckyderby&p=r

An Original Method for Statistical Analysis of Defensive Football Players Monday, Aug 4 2008 

February 2008

Who is the best defensive player of all time? Is it a defensive lineman, such as Reggie White, a linebacker, like Lawrence Taylor, or a back, like Ronnie Lott? This is a very difficult question to answer since most defensive ratings are built on reputation, which is largely inaccurate. We should be able to statistically and impartially rate defensive players. Curiously, though, the only defensive statistics currently used have serious flaws. But by breaking down team statistics and using simple and obvious but rarely used individual measures, defensive players can be cleanly and accurately analyzed.

First, let’s see what is wrong with the most common defensive statistics today. A very popular method of rating the secondary is interceptions They are valuable but also rare. In fact, it is unusual to record more than six or seven in a season. However, this scarcity makes them almost random, so they are subject to fluctuation. To illustrate this, take a good safety who intercepts five of 160 passes thrown to receivers he is covering. Using a random statistical database, I computed that chances are less than eighteen percent he will duplicate this performance in the next season, even with exactly the same chance of an interception and the same number of opportunities. Returns of interceptions for touchdowns are even more random.

The skill of the opponent can also bias a player’s number of interceptions. A division loaded with bad quarterbacks may seem to have superhuman secondaries. Strange as it may seem, good defensive backs may also benefit if the division has better passers. Since they cover their receivers closely, it is easy for them to catch well-thrown passes. Wide receivers also have an effect. Other than catching passes, they are also the first ones to be able to tackle a back. This puts strong safeties at a disadvantage for touchdown returns. They typically cover the tight end, who is almost always the best tackler among the receivers.

Interceptions also depend on defensive teammates. Linemen and linebackers pressure the quarterback into misthrows by breaking through the line and threatening to sack him. Other players in the secondary can limit a quarterback’s options and force him to throw when his timing is off or to an inferior receiver. The position a player plays often changes interception results. As we have seen, strong safeties rarely return interceptions for touchdowns since they cover tight ends. The free safety has more interception opportunities since he often guards receivers on deep passes, which are likely to lead to bad throws. Good players are disadvantaged since offenses try to throw away from them. Finally, note that interceptions are not always important plays. On fourth down they are not much better than an incompletion or a completed pass saved by a good tackle. Similarly a touchdown return makes no difference if a team is already ahead by twenty points.

The other most popular defensive statistic is sacks. This statistic is primarily designed to measure the performance of linemen and outside linebackers. While it is a better statistic than interceptions, it has many fundamental weaknesses that limit its use.

Sacks do have some advantages. Since they are more common than interceptions, they are less likely to vary from year to year. They also reflect skills other than just rushing the passer. Players with lots of sacks tend to be good at getting past the line on running plays and tackling backs. Players with sacks also harass quarterbacks into bad passes even when they don’t sack him. Tackles and ends that are good at rushing the quarterback pick up more fumbles. More general qualities are also reflected in sacks, since to tackle quarterbacks a player must be both big, to get past 300-pound linemen, and agile, to tackle smaller, faster quarterbacks.

Like interceptions, though, sacks also depend on the skill of other players. If the rest of the line is good, then additional linemen or running backs may be called up to help with the blocking. With fewer opposing players, it is easier to break through the line and get to the quarterback. The secondary also helps. If all the receivers are well covered, the quarterback keeps the ball longer since there is no good place to throw. This gives a lineman time to tackle the quarterback before he passes. Even the coaches can help a player accumulate sacks. A good scouting report will help players to determine where and when to rush.

Both sacks and interceptions have serious flaws. How can we design ways to evaluate defensive players without bias? One solution uses common team statistics to evaluate units, such as the secondary or the linebackers. These stats often correspond to the efforts and skills of a particular group of players.

The first of these statistics is a simple breakdown of points scored by opposing offenses. If high numbers of field goals are scored, this indicates clutch performance by the defense. Once the offense threatens to score a touchdown, the defense forces a field goal on fourth down. Takeaways on fourth down also reflect clutch ability. More specifically, high numbers of touchdowns can indicate weaknesses in the secondary. Touchdowns almost always depend on one or two long plays, which are usually the fault of the safeties.

Yards-per-scoring-drive helps analyze how much of a defense’s success is really due to its own play. The defense is obviously at a disadvantage if they have a weak offense. Then they tend to come onto the field when the opposing team has the ball in good field position. Yards-per-scoring-drive doesn’t work to evaluate a defense on its own, but it can help to identify when a defense is overrated or underrated.

Another team stat is the percentage of plays longer than fifteen yards on scoring and non-scoring drives. If a defense tends to allow a much greater margin on scoring drives, it indicates that offenses need big plays to succeed. This is a sign of a good defense. If there is very little difference, then offenses can score by consistently calling short plays. Defenses like this are weak against the run, so offenses use that approach against them. The percentage of plays longer than fifteen yards is very useful for evaluating the secondary. Since it is their job to make the tackle on long plays, this measures their ability both to read a play and their ability to tackle. If they tend to be good at defending against long passes, the percentage of plays longer than fifteen yards should be low. Running and passing statistics also help to evaluate parts of a defense. If the percentage of running plays for a loss is high, this reflects well on the skills of the line and linebackers, especially their rushing abilities. To measure the relative abilities of the defense, the percentage of running and passing plays called by the offense is an accurate statistic. This works especially well when several defensive players have been injured in the course of the season. Then the percentage called can be compared to the offense’s overall average. From this, it is easy to see how this compares to before the injury. A big difference indicates that that player is good either against the run or the pass.

However, most of these statistics work only to evaluate groups of players. How can we rate individual players? Surprisingly, many good individual ratings branch off of interceptions and sacks. But since these stats are either more common, less biased because of teammate or opposition performance, or more representative of other skills, they provide an accurate assessment of a player’s performance.

Probably the best measure of a cornerback or safety’s skill is incompletions. These are similar to interceptions, but have many advantages. They are much more common and therefore they are less random and fluctuate less from year to year. Incompletions also depend less on the opponent’s skill. They are much more likely to be good defense than bad passes, which often hit the ground untouched. Thus, the quarterback is less important, as is the defensive line. Also, incompletions can occur anywhere on the field. It is as common to bat down a pass right in front of the pocket as it is to knock one away from a receiver. These plays depend only on individual abilities, not on the skills of others.

Another good statistic to measure backs is the number of yards that a receiver gains after a catch. Not only does this statistic measure tackling ability, it evaluates how closely a back covers a receiver. Close coverage is important because it saves yards on passing plays and puts a back in excellent position for an incompletion or interception. Raw tackles are also an effective statistic. Tackles measure how well a back reacts to the run and how often he is able to catch the ballcarrier. In addition, they show a safety or cornerback’s ability to make a quick tackle after a catch.

Incompletions are also an effective measure of linebackers. If he covers receivers, they measure his coverage ability. If an outside linebacker rushes the quarterback, sacks judge his ability to call the pass and knock it down right after it is released. This statistic reflects his ability to read the play as a pass and try to stop it. Tackles inside the line of scrimmage rate a player’s rush ability well. Not only is key-reading important, so is rushing the line and tackling the running back. Pursuit ability is also measured, if a chase goes on before the tackle.

Even defensive linemen can be measured by incompletions, for reasons similar to those for linebackers. If in a rare event they are called into pass coverage, it measures their pass coverage ability. They also occasionally knock down throws when rushing the quarterback, which requires timing and good rushing ability. Tackles are also a helpful statistic. They indicate skill on plays right at a lineman, as well as speed on plays in a different direction.

With the most popular defensive statistics, it is almost impossible to rate defensive players. As we have seen, sacks and interceptions have serious flaws. But it is not hard to accurately analyze defensive players. Individual statistics like incompletions and tackles coupled with team statistics to analyze groups of players can give an accurate statistical representation of a player’s skills.

Adjusting for Park Effects in Baseball Monday, Aug 4 2008 

December 2007

Analyzing the effects of home parks is one of the biggest problems that baseball statisticians face. There is a huge difference between playing in Denver’s Coors Field, a high run-scoring park because of its altitude, and Washington’s RFK Stadium, whose giant dimensions make the lowest scoring park in baseball. Seventy-four home runs in Coors is the equivalent of only fifty-five in RFK. How can the bias introduced by home parks be eliminated? The most popular methods of adjusting statistics, the doubling method and the Park Factor method, have serious flaws. I invented the Equal Games method to adjust statistics without most of these errors.

The most common method of getting around the statistical distortions, doubling a player’s road statistics, is actually erroneous. The doubling method sets out to remove the advantage that players in parks like Wrigley and Fenway receive, and it does that well. Unfortunately, it goes too far in adjusting the stats. For example, suppose there is a four-team league with parks A, B, C, and D. A is an extreme pitcher’s park, allowing 2/3 of the league average in runs. B and C are average, and D allows 1 1/3 of the league average. A player on team D, therefore, has his stats inflated by 22%. Players on team A, though, have theirs reduced by 22%. Now adjust the statistics by doubling the road stats. The player from D is now below the league average by 11%, and A is over the average by this margin. Since the doubling method takes away from D players the opportunity to play in their home park and lets A players not play in A, it actually adds bias in favor of players in tough home parks.

There are other problems with the doubling method. Teams that play in divisions with hitter-friendly parks have better adjusted stats than teams from other divisions. The doubling method removes games in a hitter-friendly home park, but playing more games in high-scoring parks on the road is not eliminated. The same problem exists if the rest of the division has particularly good or bad pitchers. Playing time can also be inaccurate when statistics are adjusted. If a player is injured for a short time during a homestand, doubling the road statistics will give him more playing time than he actually got. Injuries or suspensions during road trips reduce adjusted games and at-bats, as well as all statistics. Finally, since most players play about 3% better at home than on the road, the general level of production declines for both hitters and pitchers. After adjustment, runs scored and allowed, and most other statistics, don’t match. When these four distortions are combined, the doubling method is very inaccurate.

The Park Factor method avoids the deflation of statistics as well as problems with playing time. It assigns a factor to each home park depending on runs scored in that park compared to the league average. Since this method eliminates stadium effects but preserves the number of games played at the home park, overall statistical levels stay the same. Also, since only the statistics and not games played in different parks are changed, players have the right number of at-bats. But the method has disadvantages as well. For example, Houston’s Minute Maid Park is great for right-handed hitters but a nightmare for lefties. The Park Factor method can’t account for biases like these when analyzing Houston players. Similarly, a stadium can raise home runs but decrease triples. This causes major problems when the Park Factor method is used. The home park still changes half the statistics and the Park Factor method doesn’t adjust individual statistics like this. This method also causes problems in the evaluation of players. It can cause power hitters to be seen as speedy, or vice versa. To avoid this problem, a separate factor has to be calculated for every statistic.

Both the doubling method and Park Factor method have serious flaws, so a different method must be used to properly adjust statistics for park effects. Such a method must not eliminate home parks but weight them similarly to other parks, and it should not give teams in high or low run scoring divisions an edge. Using these guidelines, I invented the Equal Games method. It works by making a player play an equal number of games in every stadium. Then the home park isn’t allowed to dominate the statistics. The formula is:

  • Find the number of opposing teams a player faced (NT)
  • Find the number of at-bats against each team (AB1, AB2, etc.)
  • Adjusted Statistics = (NT / AB1) S1 + (NT / AB2) S2 +…, where SN is the stats against the appropriate team

This method has a few problems. Like the doubling method, statistical levels are reduced because most players play better at home. But as long as the statistic is confined to its main task of head-to-head player comparisons, it is much superior to the doubling method and PF method. Also, if comparing an entire league, all the statistics can be multiplied by (NT -1)/ NT x 1.015.

Home parks often distort player statistics and make analyses almost impossible. To avoid these changes, many methods try to adjust the statistics according to home parks. But these methods have problems with general statistical deflation, playing time, and imbalance between different statistics. The Equal Games method avoids these problems by rating every park similarly.

Seven Tips For Picking Baseball Playoff Winners Monday, Aug 4 2008 

October 2007

With the baseball playoffs approaching, a favorite occupation of baseball fans is predicting the results of the series and the eventual champion. It can be hard to analyze talent at all positions and all the factors that influence the result. This article contains seven guidelines for predicting baseball champions.

1. Park Effects are Key

In the long run, teams play about the same number of games in pitcher’s parks then they do in hitter’s parks. In the playoffs, teams often play a greater number of their games in one park because they play fewer total games. Therefore, park effects are of more importance in the playoffs than in the regular season.

How can you figure out which teams are aided by what kinds of parks? A team that depends mostly on singles would do best in a large park where it is easier to hit balls between the outfielders and where their lack of power doesn’t hurt them. A power-hitting team, on the other hand, would obviously benefit from playing in a small park. It is easy to determine whether a team has power or not. First, look at the hitters and divide the batting average of the starting lineup by the slugging average. If the number is greater than three-fifths then the team is like Ichiro Suzuki in that they generally use singles and stolen bases to score runs. If the number is less than three-fifths then the team is more similar to Adrian Beltre or Barry Bonds in that they count more on the long ball and walks. A team’s success in the playoffs can depend on park effects, so it is important to account for whether a team uses singles or power to win games.

If a park has asymmetrical dimensions, the outcome of a game may hinge on whether a team has right or left-handed talent. Just remember that right-handed hitters generally hit to left field and southpaws to right. Even the pitching staff can be influenced by a park. If a team has pitchers with high homeruns allowed numbers, they do best in large fields since balls that would be homers in a smaller stadium turn into long flyouts.

2. Teams Need Balance in Hitting

Which team did better in this World Series between the New York Yankees and the Pittsburgh Pirates?

Game NYY Runs PIT Runs Winner
1 4 6 PIT
2 16 3 NYY
3 10 0 NYY
4 2 3 PIT
5 2 5 PIT
6 12 0 NYY
7 9 10 PIT
Series 55 27 PIT

New York scored more than twice as many runs as Pittsburgh yet lost the series 4-3. Why was this? Bill Mazeroski’s home run in the bottom of the ninth inning of the seventh game might have had something to do with it, but notice the pattern here. With three blowouts and four close losses, New York’s number of runs varied wildly. Pittsburgh’s offense was remarkably consistent; only the Game Seven win was out of place. We can measure the amount of variation between games with standard deviation. The Yankees had a 5.37 standard deviation while the Pirates had only 3.53, and just 2.48 excepting Game Seven. The typical standard deviation of a major league team is about 3.5. Pittsburgh won because of their low standard deviation, despite the fact that New York scored more than twice as many runs.

Having a low standard deviation can drive a team all the way to the World Series. Think of it this way: Which result is better, a 15-2 win or a 5-4 win? Both scores are equal, since as long as you win, it doesn’t matter how many runs you score. For teams to be successful in the playoffs, they must have a high winning percentage and not a high number of runs scored. To measure the effects of standard deviation, I conducted a statistical study of two hypothetical teams, each scoring the league average number of runs. Team 1, however, had a 3.68 standard deviation, while Team 2 had 4.55. Because of their low standard deviation, Team 1 had a .552 winning percentage over more than 3,000 games. This translates to an incredible .611 winning percentage in the World Series. A low standard deviation of runs scored is a major factor of a team’s success.

An easy way to predict success in the playoffs is to look at a team’s fluctuation in runs. To evaluate a team’s lineup in this way, get a record of a team’s games and calculate the standard deviation. If this method isn’t practicable, then just look for balance in a team’s lineup. Teams with a high standard deviation have greater fluctuation in the number of runs scored. There is also more fluctuation among a few players than many. Teams that depend on a small core of batters have higher standard deviations than teams who have balanced lineups where most players can hit fairly well.

The St. Louis Cardinals of 2004 clearly demonstrate why balance in a lineup is key to success. Although they had the spectacular hitting quartet of Albert Pujols, Jim Edmonds, Larry Walker, and Scott Rolen, the Redbirds were weak at catcher, second base, and left field. Because of this imbalance in their lineup, they were swept by the Red Sox in the World Series. In comparison, another great team, the 1998 edition of the Yankees, had one of the most balanced lineups of all time. Their worst regular player, Chad Curtis, had a reasonable .360 on-base percentage and scored 79 runs in just 148 games. The Yankees, of course, swept the Padres in the World Series. The key factor here is the balance of the lineup.

One of the factors that most affects team performance is fluctuation. Because a team’s direct objective is to win games, not score runs, the standard deviation can be used to forecast the performance. Since this is a lot of work, though, another way is just to look for a balanced lineup that doesn’t depend too much on any one player. Don’t forget that in addition to having low standard deviation teams must also have a high average of runs scored.

3. You Really Don’t Need Five Pitchers for the World Series!

The biggest misconception about starting pitching in the playoffs is that all five pitchers in a rotation are important. In predicting the playoffs, however, it is only important to look at four of the starting pitchers. Since the two teams are playing only seven games in nine days, it is easy to have four pitchers take care of the series. Four days rest is standard for a pitcher, with some pitchers being able to do three days. Consider these schedules for pitchers, with 1 being the #1 starter and so forth, and x being a day off.

1 2 x 3 1 4 x 2 1

1 2 x 4 1 2 x 3 1

Both of these methods assume the #1 starter can pitch on three days rest. Even if no hurlers can throw on three days rest, there are still plenty of ways:

1 2 x 3 4 1 x 2 3

3 2 x 1 4 3 x 2 1

What do all these schedules show? A team only needs four good starting pitchers to succeed in the playoffs. While fifth starters may be important in the regular season because of injuries and fewer days off, they are not needed in the playoffs. In trying to predict the playoffs, don’t bother to look at the fifth starters. The first four starters are the only important ones. Two pitchers alone can carry a team to the championship. Consider Curt Schilling and Randy Johnson, the key players in the Arizona Diamondbacks 2001 World Series win. Even though reliever Byung-Hyun Kim allowed two game-winning home runs, Johnson and Schilling led the D-backs to their first World Series championship and had the best starting performance in the playoffs since the Dodger’s pitching staff in 1963.

4. Three Relief Pitchers Especially Important

Notice that with both hitters and starting pitchers it isn’t necessary to have more than a certain number of good players. With hitters, the lineup is by far the most important factor. Similarly, only four starters are needed in the playoffs. It’s the same with relief pitchers, since three are enough for a series. This means that bullpen depth is not key when looking at a team and analyzing its chances.

There are three types of relief pitchers. There are closers, players responsible for getting out the side in the last inning like Trevor Hoffman and Mariano Rivera. To set them up, there are long relief pitchers. The rest of the relief pitchers are versatile swingmen who can pitch in short or long relief and even start if necessary.

It’s easy to show that a team only needs one of each kind of these relief pitchers in the playoffs. Assume that they need a closer for five games. Since pitchers like this often pitch on very little rest, one man should be able to handle this workload. The long relief pitcher should come in three or four times, two or three to set up the closer and one alone. Finally, the third relief pitcher can take care of extra-inning duties (about 44% of World Series contain an extra inning game) and anything else.

5. Watch for Designated Hitter Opportunities in World Series

The only key difference between the AL and NL is the designated hitter, and when teams from the two leagues play in the World Series, adapting to the DH rule can be decisive. How can each team cope with changing their lineup and try to make the best of this situation? This factor can play a major role in the outcome of the World Series and therefore it is important to take it into account when comparing the two pennant winners and predicting the overall champion.

How can the AL put the bat of the DH into their lineup but not destroy their defense? The usual solution is to put the DH at first base. What can the manager do, however, when a good-hitting, poor-fielding player already mans the first sack? There are several solutions to this problem. One way to get out of the dilemma is to just put the DH at first and hope for the best. This doesn’t hurt the defense too much and does improve the offense slightly. It takes a good hitter out of the lineup, though, and is not a good solution if the former first-baseman’s bat is desperately needed. Therefore, the method that should be used is to put the first baseman at a position where he will do the least damage and then put the DH at first. With this method a team keeps both hitters in the lineup and gets a weaker bat out of the game. If a team has a poor hitter in left or right field, this can be the optimal situation for them. If not, then you can degrade their chances for games 3-5.

It is much easier for the NL to adapt than it is for the AL. All they have to do is take out the best bat and worst glove combination in their lineup, put them at DH, and put in a slick-fielding and hopefully good-hitting player in.

6. Relief Pitchers Dominant in Division Series

The bullpen is the key factor in the Division Series. The fire squad is important to prevent late rallies. If the relief pitchers are not able to protect against a late loss, a team can rarely recover since the series is short and every game counts. They must come back with a rally of their own off the opponent’s bullpen to win another game. In predicting Division Series victories, the bullpen should be the foremost factor.

A good example is the division series between the Texas Rangers and the New York Yankees in 1996. In every single game, Texas had an early lead. Then why did they lose the series 3-1? Their bullpen had an ERA of 2.40, mediocre for relief pitchers. New York, on the other hand, had a brilliant .42 ERA for their relief pitchers, including 4.2 innings of scoreless pitching from Mariano Rivera.

7. Division Series Organization Is Key, World Series is Not

In the division series, the #1 or #2 seed hosts games 3, 4, and 5 while the #4 or #3 seed plays games 1 and 2 at home. Does the organization help one team and if so, how can you use this information to help predict the winner?

It turns out that since home teams usually win 53% of the games, it is easy to find that the top-ranked seed wins .511 percent of the games. This is a fairly significant advantage. We also have to take into account that the higher seed is a better team. Assuming an advantage of 5 wins during the regular season for the 2-3 seed game and a 12 game advantage for the 1-4 seed game, here is what I found:

  • #1 seeds should win .669 of the time
  • #2 seeds should win .550 of the time
  • Both series come down to a final 5th game about .376 of the time

Why is this last piece important? In the division series, a team can start their top four pitchers in order and then their #1 pitcher in the last game. Since this happens more than a third of the time, the ace of the staff can be a very important player.

It’s clear that home field advantage has an effect in the Division series. In the World Series, though, the home advantage has little or no effect. Teams with the advantage should actually have only a .508 winning percentage, nothing special. Because the winner of the All-Star game has their pennant winner host the first two and last two games of the Series, the system has recently gotten a lot of publicity, but the statistical evidence does no suggest that it has any effect. Also, since there is very little correlation between winning the All-Star Game and the World Series, the better team does not necessarily have the advantage.

With these tips, predicting the winners in the playoffs should be easy. Best of luck, and may the team you pick win!

Secretariat’s Belmont: The Greatest Performance of All Time Monday, Aug 4 2008 

August 2007

The oldest, most famous series of races in horse racing is the Triple Crown, consisting of the Kentucky Derby, the Preakness, and the Belmont. No horse has won the Triple Crown since 1978, a break of twenty-nine years. This is similar to what happened from 1949-1972, when for twenty-five years no horse won the Triple Crown. In 1973, however, Secretariat was considered a major candidate to win the three races. The year before, he was the first two year old ever to win the Horse of the Year award. He finished a weak third, though, in the Wood Memorial, his prep race for the Triple Crown. Secretariat then rebounded to win the Triple Crown, winning every race in track-record time. The most amazing race was the Belmont, in which he triumphed by thirty-one lengths while setting a world record. The performance immediately achieved great acclaim. For example, veteran trainer John Gaver called it “the greatest exhibition of speed and stamina I had ever seen.” The race is still recognized to be brilliant, as Secretariat’s record stands and his margin of victory has not been bettered. Because of the time of the race, the speed and stamina displayed, and the fatigue that Secretariat overcame to win, the race is the greatest performance in horse racing history.

The huge winning margin at Belmont can be partly explained by the fact that Secretariat was facing just five opponents. Ten lengths also separated the third and fourth place finishers, showing that perhaps Secretariat was just the best of a field with widely varying talents. The Santa Anita Derby winner Sham finished within three lengths of Secretariat in all of their previous meetings. This time he was last, forty-two and a quarter lengths behind the winner. Thus, it is likely that all of the other horses were better than Sham. Secretariat’s winning margin of thirty-one lengths is very impressive even though the field was small and talent may have differed widely.

In winning the one and a half mile long Belmont, Secretariat set a world record of 2:24. This broke the world record by two and one fifth seconds, a margin which has never been equaled. Normally when a horse crosses the finish line, his jockey pulls him up. Secretariat, though, actually ran past the finish at top speed, and his time for one and five-eighth’s miles also broke the world record. Even before he had crossed the finish line, Secretariat ran parts of the race faster than the Belmont track records. Secretariat ran one and three-sixteenths, one and a quarter, and one and three-eighths miles faster than the Belmont track records. Unfortunately, only the time at the finish counts for record purposes, but if times before and after counted, Secretariat would have set five track records in one race.

Obviously, one of the most incredible parts of the triumph was the huge gap between Secretariat and his opponents. The only other victory by more than thirty lengths in a stakes race was by Man o’ War in the 1920 Lawrence Realization, which he reportedly won by one hundred lengths. Thus, as trainer Lazaro Barrera said of Secretariat after the 1973 race, “The performance he put on in the Belmont – you have to go back to Man o’ War to compare it.” In many ways, though, Secretariat’s winning margin is a greater achievement than Man o’ War’s. Man o’ War’s one hundred lengths are almost certainly an overestimate. He was also facing just one opponent, while Secretariat faced five. Another astounding fact about the 1973 race is that Secretariat, who held the lead from the beginning, increased his winning margin throughout the race! This almost never happens unless the leader runs the first part of the race very slowly, or is not pressured during the beginning. Secretariat ran the opening six furlongs faster than four of the eight most recent running of the prestigious Tobbogan Handicap. It is amazing that Secretariat increased his lead even though he was close to other horses through fast opening fractions.

Secretariat’s closest opponent in both the Kentucky Derby and the Preakness was Sham. Sham battled with Secretariat for the first half of the Belmont but then finished last. The two had seemed fairly evenly matched in all of their previous meetings, but in this race, the quick pace did not seem to greatly affect Secretariat, while it destroyed Sham’s chances. The devastating pace that he set did not tire him.

Finally, the most notable part of the race was that Secretariat should have been very tired before the Belmont, but still ran a spectacular race. His time of 1:59 2/5 in the Derby broke several records, and according to the Daily Racing Form, he ran the Preakness in 1:53 2/5, a world record. In addition to these races, he had seven blindingly fast workouts in six weeks, several of those especially impressive because they were on sloppy surfaces. Most horses could not possibly run their best race after two great starts and an exhausting series of workouts. And Secretariat was certainly affected by the schedule, as he lost four hundred pounds, more than a quarter of his weight, during the Triple Crown. The Belmont, however, was by far the best race of his career. Although Secretariat had many grueling races and workouts before the Belmont, he still ran a brilliant race.

In 1973, Secretariat won the Belmont Stakes by thirty-one lengths in world-record time. His record has not been broken on dirt since, and he would have broken five track records if his times before and after the race had been official. Even though the opening of the race was brilliantly fast, Secretariat did not tire and increased his lead throughout the race. Also, he had run two great races and seven workouts in six weeks. However, he showed no signs of being fatigued in the race. Therefore, Secretariat’s race in the Belmont, which won the first Triple Crown in twenty-five years, was the greatest performance in horse racing history.

Thanks to the librarians at the Keeneland Library for their research help with this article. Major sources for this article included Charles Hatton’s analysis of Secretariat in the 1974 American Racing Manual (Daily Racing Form, 1974) and William H. Rudy’s “Reactions” article in the Blood-Horse (June 18, 1973). For more information, see www.secretariat.com.

Who Cares What the Cardinal’s Winning Percentage is on Cinco de Mayo?: The Most Common Sports Statistics Mistakes and How to Avoid Them Monday, Aug 4 2008 

February 2007

Sports statistics are helpful tools for determining the relative ability of players, teams, or leagues. Indeed, there are so many different stats players can be evaluated many different ways. But incorrect conclusions are often drawn, and this column discusses three of the most common fallacies made.

1. Forgetting to Differentiate between Eras.

This is not a very common mistake, because sports performance has not changed much in many sports over time and computers have made adjusting a lot easier. But people still sometimes forget about differences. Your knowledgeable but mistaken baseball fan might say, “of course, today’s baseball is a lot like 1930’s baseball, with lots of home runs and little stealing”. In fact, compare home runs for the 1935 and 2000 National Leagues: Even accounting for the fact that 2000 had twice as many teams, 150% more home runs were hit in 2000! Runs scored were not much different, but strikeouts and walks were both much more frequent in 2000.

Even when comparing contemporaries, mistakes can be made by not adjusting. In 2000, ERAS were .29 higher in the AL than in the NL, because the DH has helped hitting and thus hurt pitching. Even before the DH, in 1963 about 410 more runs were scored in the AL than the NL.

2. Using Misleading Statistics.

This is a very tricky mistake, because it’s not really possible to know whether a statistic is bad or not. But by now, we have a pretty good idea of which statistics are misleading or just plain awful in most sports. A list of some of the most misleading stats:

Baseball: RBI, runs, batting average, wins, losses, winning percentage

Football: Sacks, interceptions, touchdowns

Basketball: Overall points scored, many volume stats

Golf: Holes in one, scores on a back or front nine without course analysis

Horse racing: Money won, breeding (after a horse has already shown its ability)

The main theme that should show in these numbers is volume statistics. Although rate statistics don’t adjust for era, they do adjust for playing time, can be adjusted more easily, and the numerator is less likely to be a misleading volume statistic.

But one big rate statistic pops out here: batting average. This is the most commonly seen baseball statistic, and the most commonly used for evaluating hitters, but it has lots of weaknesses. It includes sacrifice hits in at-bats, a bad mistake when bunting is used so commonly. But much more important, it does not include walks or hit-by-pitch which are not much less valuable than a walk and certainly not less than half as valuable, as some old time books say.

On-base percentage takes care of these two problems, and slugging average neglects them but weighs different hits more accurately. For an accurate picture and no time, calculate slugging times on-base or 2×2 rate distribution. If you have forever (or the right book) look up or calculate Bill James’s win shares, an extremely accurate way to measure single-season value for both hitters and pitchers.

Sometimes it’s easy to tell if a statistic is wrong or just silly. Who cares what the Cardinal’s winning percentage is on Cinco de Mayo? Who cares whether Jack Nicklaus has more eagles on hole 3 or hole 14? Who cares whether the 49ers have more touchdown passes when the moon is full or new? These things just don’t matter! Plenty of things have no effect on the outcome of something; just because an opera singer missed a high C doesn’t mean that Mark McGwire will eventually be elected to the Hall of Fame.

3. Concentrating on One Statistic or Time Period

This is one of the easier mistakes to identify, but it’s hard to know exactly what evaluations of this kind are fallacies. All statistics have their advantages, and even some of the above, if not helpful for evaluations, are good for a laugh. But sometimes neglecting the whole can be fatal to an argument, and it’s important to remember that no statistic is unimportant enough to not be included in your argument.

With time periods, a similar mistake can be made by only considering part of a player’s career. Leaving aside the fact that he took steroids, Ken Caminiti simply cannot be judged by his stellar 1996 season, because it is so out of form with the rest of his career. Using Total Baseball’s TPR, he is:

1994: 1.6

‘5: 4.0

‘6: 7.3, 54th best season of all time!

‘7: 4.8

‘8: 1.2

Although he had two reasonably good seasons flanking ‘96, it is just not fair to evaluate him by ‘96. One season does not make a great player. Five may in baseball, three may in football. But one great season is just that, one great season.

In evaluating sports performance using statistics mistakes are commonly made. If you learn to recognize and avoid these mistakes, it will help you to fairly judge the performance and ability of players, teams and seasons.

Comprehensive Ratings in Pro Football Monday, Aug 4 2008 

December 2006

The most complicated statistic in pro football today is certainly Passer Rating, a measure of quarterback effectiveness. Although it does not include running, play calling, and other factors that contribute to a quarterback’s success, this is still a very helpful tool if used correctly. Also, it has a standard benchmark of superiority (100) that is easy to remember, a good feature in any statistic.

But what about the other offensive stars, running backs and wide receivers? They clearly deserve such sophisticated measures. This article features some of the author’s statistics in this line using yards gained and touchdowns.

For both positions, the main statistics that can be used are:

  • Yards gained per attempt (for speed, ability, etc.)
  • Yards gained (for durability, reliability)
  • Touchdowns (for “ability to make the big play”, morale, etc.)

The key in Passer Rating is calculating the percentage of important stats including completions and touchdowns, but in this case, a simpler way might be to just divide by the league total. This way we also get an automatic adjustment to the league context and era. But then players from eras with less teams will have a significant advantage, which we don’t want. The obvious solution is to first divide the league total by the number of teams.

Now we just need to find what performance 100 should get. For running backs, 1,600 yards, 12 touchdowns, and 5.3 yards per attempt; for wide receivers 1,100 yards, 10 touchdowns, and 11.6 yards per reception. Both are good general benchmarks of superiority.

For running backs the overall formula is:


Yards + Yards per attempt + Touchdowns x 45

Yards (L/T) Yards per attempt(L/T) Touchdowns(L/T)

(Note: (L/T denotes the average per team.)

Let’s use this statistic to compare Shaun Alexander in 2004 with Ahman Green in 2003:

Player Yards/Att. Yards TD’s Rating
Alexander 2004 4.8 1,696 16 148
Green 2003 5.4 1,883 15 150

At first glance, Green appears to be vastly superior to Alexander. He has more than half a yard more per attempt, and Alexander’s one touchdown advantage is hardly enough to overcome that. However, Runner Rating shows that it is actually closer then we might think: Green has only a slight lead.

Runner rating is hardly a tell-all statistic. It has weaknesses like limited playing time, playing surfaces, and experience, but if used correctly can be very powerful.

We can use a similar method to rate wide receivers. In this case, since teams generally use more receivers than running backs, the multiplying factor, instead of being 45, should be higher.

Yards + Yards per reception + Touchdowns x 57

Yards(L/T) Yards per reception(L/T) Touchdowns(L/T

Now let’s use this to compare Darrell Jackson in 2003 with Torry Holt in 2004:

Year Yards Yards/Recep. TD’s Rating
Holt 1,372 14.6 10 118
Jackson 1,119 16.7 9 129

They seem very close. Jackson has a large edge in yards per reception, but Holt has more yards and touchdowns. Jackson has a much higher rating, though, because the pass was used less in 2003, and so he had a higher performance compared to the league.

These examples have shown that running backs generally have higher ratings then receivers or even quarterbacks. But this is realistic; running backs do have a higher value since they have more carries. Running back performance also depends more on individual skill more than do receivers and quarterbacks. It’s true that getting the right blocks helps a lot, but quick, accurate decision-making and speed are the majority of the job.

Passer rating is a very powerful statistic reflecting most of a passer’s qualities. Similar ratings can be created for receivers and running backs. By combining a few important measures of a player’s ability and multiplying, one gets a useful statistic.

Adjusted Stats: Modern Techniques Applied to Sports Monday, Aug 4 2008 

August 2006

Adjusted statistics are one of the newest and most helpful things to grace the world of sports statistics. These statistics correct flawed numbers to account for differences in era, league, and even games played.

To better understand, let’s compare Mark McGwire’s stellar 1998 season, in which he hit seventy homers, to Babe Ruth in 1919, who led the league with twenty-nine. Back then, in the dead ball era, home runs were still scarce, and Ruth didn’t even play full time, pitching seventeen games and going 9-5! After adjusting for this fact, assuming Ruth had 550 AB’s, we can make chart of their home runs:

Name Adj. HR’s League HR’s Pct. of league 1998 HR’s
M. McGwire 70 2565 2.7% 70
B. Ruth 37 497 (adj.) 7.4% 190

First of all, the NL in 1998 hit 2,565 home runs, and the AL in 1919 hit 240. However, you see it in the chart as 497. Why is this? The NL had sixteen teams, and the AL had eight. Since we are calculating the percentage compared to the league, then the AL should be doubled. Next we calculate the percent compared to league, and then multiply by 2,565 to get the expected number of homers in 1998. Note that Ruth would have had almost three times as many homers! This example, if extreme, does show the power of adjusted statistics.

Although adjusted statistics can be used to account for era, they can also be used to correct for position. If we are comparing a corner outfielder to a third baseman, it becomes necessary to adjust for the fact that outfielders have much higher offensive expectations than third baseman. Seeing this, it becomes clear that since Mike Schmidt was a third baseman, his eight home run titles are one of the greatest achievements in baseball history.

Although adjusted statistics are only used in baseball, there is no reason why it should not be possible to use them for other sports like football. For example, which AFC leader was greater: Michael Strahan in 2003 with eighteen and a half sacks, or Dwight Freeney in 2004 with sixteen?

Name Sacks Conference Sacks Pct. of conference 2004 Sacks
M. Strahan 18.5 544 3.4% 20
D. Freeney 16.0 583 2.4% 16

This time the adjusted statistics don’t reverse the margin, rather they augment it.

Of course, this is expected to happen half the time, and can even be helpful in making a statistical argument.

For a powerful example of this augmentation, take Babe Ruth in 1927 with sixty homers versus Roger Maris in 1961, with sixty-one, which broke Ruth’s record. This is obviously neck-and-neck, and so every factor must be taken into account. Both batted left-handed. Both were corner outfielders. Both played their home games at Yankee Stadium. Maris played in a few more games, and Ruth was walked more often. However, the time in which they played is undoubtedly the decisive factor.

Name HR’s Adj. League HR’s Pct. of league 1961 HR’s
R. Maris 61 1086 5.6% 61
B. Ruth 60 549 10.9% 118

The 1927 league stats are adjusted for the fact that Maris’ league had ten teams, but Ruth’s had eight. After doing the necessary calculations, we find that Ruth would have hit almost twice as many homers as Maris in 1961. This resolves one of the greatest statistical arguments in all of baseball.

This method works very well for seasons, but when adjusting for careers we need to be more cautious. Say you are comparing Jackie Robinson to Pete Rose. Most ballplayers reach their prime around 26-28, and then tail off. However, Robinson entered the majors at age 29, and only then do we have close-to-full statistics of his performance. This obviously favors Rose, and the only ways to close the gap are to use Robinson’s Negro League statistics and slightly raise them for the war years that he missed, or do the same with his major league stats. Unfortunately, this method risks having misleading estimations and can give a distorted picture.

You also need to keep your data sets in mind. Say you are comparing two top level equine sprinter’s six-furlong times. If you did the usual solution of averaging the year’s times, your statistics would be incorrect. Since two far apart years will have different ratios of low level claimers to higher class allowances and stakes, you will end up with times skewed in one direction or the other. Because of this, since these are first rate horses, the obvious solution is to just use stakes races. Still, this is a common error and one that can have a large effect.

For adjusted baseball statistics using the method outlined in this column, I find the book “Leveling The Field” by G. Scott Thomas to be a very complete resource. It uses these statistics to simulate playoffs, answer questions like “What was the greatest baseball team of all time” and even compute what players’ salaries would be like in today’s world. It also includes career adjusted statistics for more than 400 of the greatest players of all time.

If used correctly, adjusted statistics can give a sizeable boost to the knowledgeable fan’s position. They are one of the most dangerous and most satisfying tools in today’s world of sports statistics.

Next Page »