in

Analytics 101: A Guide To Understanding Advanced Baseball Statistics

The Barrel Zone from MLB.com

Analytics, also called sabermetrics, rule baseball front offices, and on-field decision making. To most fans, they’re just a confusing or misunderstood topic. Some people think they ruin baseball or change the game too much, but that is just not the case. Analytics are just used to try to find the true impact a player makes on a game or to help those players make a bigger impact. To put it simply, analytics is just a fancy term for information. Fans should be happy their favorite team utilizes as much information as possible.

Fans should know that no stat is perfect. Every stat in use today has flaws, some just have less than others. The closest stats to perfection are Statcast data but even they have flaws. Every stat is just a piece of the puzzle to understand what is truly happening on the field.

People should also know all the stats in use have been studied and researched by really smart people in the game. Those people have disagreements on things but not without good reasoning. That’s why there are different versions of WAR, which we will discuss later. All these stats have had a lot of time put into them. They aren’t just thrown together to try and ‘ruin the game.’

They can also get very complicated and confusing; Especially if a person doesn’t have the time to study and memorize them. This guide should help you understand what most of the stats mean, how to interpenetrate them, and why they are used. There are way too many stats to discuss all of them so this guide will go over the most commonly used ones. There are also much better and deeper explanations of each one online if you find yourself interested in it.

History

The concept of analytics goes back to the beginning of baseball. Yes, you read that right. People have always wanted to separate players’ performances from their team’s performance. A pitcher’s win-loss record was an early attempt to make a stat to tell how a player performed. The concept made more sense when starters pitched an entire game but it still had its flaws. And as we know now, the stat doesn’t work at all like it was intended too, but everything is learned through trial and error.

The first team to really put them to use was the Brooklyn Dodgers under Walter O’Malley and Branch Rickey. The team hired a statistician named Allan Roth in 1947.

O’Malley was so interested in baseball statistics and their analytics that the Dodgers hired Allan Roth to work on interpolating the numbers for the team.

“Allan works for us all the year round,” said O’Malley. “It isn’t just sets of dry statistics for press releases. His compilations aid us in making decisions. Even in making trades.”

Roth proceeded to make a big impression as he advanced statistical analysis to a whole new level, working during the season and in the off-season.

“Allan goes beyond the verbal word of our scouts and other observers,” said O’Malley. “Underlying causes are important.”

Rickey knew there was an untapped world of information. Roth confirmed Rickey’s idea that runs batted in only mattered if they were correlated with chances to drive them in. He also provided evidence to prove platoon advantages are real and that on-base percentage mattered more than batting average. Roth also kept track of player splits, spray charts, and pitch charts for the team. The Dodgers had the largest amount of information in baseball under Roth and Rickey.

Analytics became widely known because of the book Moneyball: The Art of Winning an Unfair Game, by Michael Lewis, which was later turned into a movie. The story tells how the 2002 Oakland Athletics, led by Billy Beane and Paul DePodesta, used concepts made by Bill James, to replace their superstars who left in free agency with overlooked players no other team wanted. It resulted in 103 wins and a trip to the playoffs with the 6th smallest payroll in the league.

Now, every team in baseball has an analytics department and the best teams have made them a key focus in running their organization.

 The Flaws Of Traditional Stats

Batting Average AVG counts all hits as the same. We know a home run does not equal a single. It also ignores other ways of getting on base like taking a walk.
On Base Percentage OBP counts all hits and walks equal to each other, which just isn’t true.
Slugging Percentage SLG is not weighted correctly and it ignores walks. A home run is not worth 4 times what a single is worth, a triple is not worth 3 times what single is worth, and so on.
On Base Plus Slugging  Like SLG, the stat is not weighted correctly and it gives a boost to power hitters. While power hitters are usually more productive, a homer is not worth 5 times what a single is worth. It is also flawed to add 2 stats together that have a different scale. A .500 SLG is good while a .350 OBP is good, it undervalues high OBP low slugging players.
Runs Batted In RBI is largely dependent on opportunity based on what the rest of the team does. A bad hitter who hits behind Mike Trout and Mookie Betts will still get a lot of RBI because he gets so many chances.
Runs Scored Like RBI, runs scored relies too much on what the rest of the team does to be an effective elevator for 1 player.
Pitcher Record A pitcher can throw 1 pitch and get a win. A pitcher can go 9 innings, allowing 1 run and lose. And why do starters have to go 5 innings to qualify for a win when a reliever can go 0.1 and win? It’s also incredibly arbitrary and decided on by the scorer.
Earned Run Average A team’s defense can have a large effect on ERA. It also ignores runs scored as the result of an error and errors are truly subjective based on how the scorer feels at the moment. Hey, errors are a bad stat too. I guess this was a 2 for 1.
WHIP Like ERA, it also relies on the team defense and factors a pitcher can’t control. It also ignores a pitcher hitting a batter for some reason.
Saves The stat is incredibly context dependent and arbitrary. To quote Keith Law’s book Smart Baseball, “To be credited with a save under the current version of the rule, which has been in place since 1975, a pitcher must record the final out in a game that his team won, but one where he didn’t get the win, and the team didn’t win by too many runs because then he obviously contributed nothing at all.”

The flaws of these stats can be explained in a deeper and more informative way, I just want to give a simple and quick explanation.

Key Things To Know

When there is a w in the stat. The stat is weighted, based on something called linear weights. This basically means not all hits are equal and it assigns a value to each hit. Unlike in batting average, a home run counts for more than a triple, a triple counts for more than a double, and so on.
When there is an x in the stat. The stat is an expected stat. This means the stat is based on what was expected to happen based on data like launch angle, exit velocity. and so on.
When there is a + in the stat. The stat is park and league adjusted. The purpose of it is to try and remove the effects of playing in stadiums like Coors Field vs ones in AT&T Park and to compare the player to the rest of the league. The higher the number above league average the better and the lower the number below average the worse. For example, if the league average is 100, 120 is better than 70.
When there is a in the stat. This stat is park and league adjusted but any number below league average is better and any number above league average is worse. For example, if the league average is 100, 70 is better than 120.

Most of the hitters stats also work for what the pitchers allowed. Having a basic understanding of these symbols can help you learn what other stats are saying, even if they aren’t discussed in this post.

The Stats To Know

wOBA

Weighted on-base average is one of the simpler stats to understand and it is also incredibly effective. It is on-base percentage that also gives more points for extra-base hits and slightly fewer points for walks and hit by pitches. It should become the go-to stat for fans because it combines batting average, on base percentage, and slugging percentage all into 1 more accurate number. This stat can be used for hitters and pitchers (wOBA allowed). It is on the same scale as on-base percentage so .320 is considered average, .400 and above is excellent, and .290 and below is awful.

wRC+

Weighted runs created plus is very similar to wOBA except it also is park and league adjusted. It’s also on a different scale, instead of using the same scale as on-base percentage, wRC+ uses a system where 100 is league average and anything over is better and anything under is worse. It is also adjusted for the era since it’s league adjusted so a 127 wRC+ in 2019 is just as valuable as a 127 wRC+ in 1936. wRC+ is a good stat for comparing offensive production. Its main flaw is that you can’t create perfect park effects, so unlike wOBA, there is some estimation in it.

ISO

Isolated Power tells how often a player hits for extra bases. It is calculated as SLG-AVG. Iso more so tells what kind of hitter a player is instead of how much value he produced. Its flaw is that it counts all extra-base hits as the same value. Fangraphs has a good description on why it’s useful.

 A .300 average with very few extra base hits is quite different from a .300 average with 40 home runs. The same is true of a .500 slugging percentage that is driven by many singles versus one driven by lots of doubles and home runs.

A .140 ISO is average, .240 and above is excellent, and .080 and below is awful.

BABIP

Batting average on balls in play is exactly what it sounds like. It’s what the player’s hitting when he puts the ball in play and removes strikeouts and home runs from batting average.

BABIP isn’t a stat you want to use on its own. The stat can help tell you if a player is unlucky or lucky but it is also influenced by speed and hard-hit ball numbers.

K% and BB%

Strikeout and walk percentage are simply the percentage of times a batter strikes out or walks in his plate appearances. A 20% K% and 8% BB% is considered average. A 10% K% and a 15% BB% are excellent. A 27.5% K% and 4% BB% is awful.

Both of these are also used for pitchers. For pitchers, an average K% is 20% and BB% is 7.7%. An excellent K% is 27% and an excellent BB% is 4.5%. A 13% K% and 9% BB% are awful for pitchers.

FIP

Fielding independent pitching is ERA that removes the pitcher’s defense from the equation. FanGraphs has a very nice summary of it here.

It is a statistic that estimates their ERA based on their strikeouts, walks, hit batters, and home runs while assuming average luck on balls in play, defense, and sequencing is a better reflection of that pitcher’s performance over a given period of time. This is highly related to the reasons why we care so much about Batting Average on Balls in Play (BABIP), specifically the fact that pitchers have very little control over their BABIP allowed.

FIP is on the same scale as ERA so 4.20 is considered average.

xFIP

Expected fielding independent pitching is similar to FIP but it gives a league average home run to fly ball rate instead of the pitcher’s actual home run to fly ball rate. It helps remove park effects of home runs. For example, a home run hit at Coors field might not be a home run at Dodger Stadium even if everything else about the hit was the same. That doesn’t mean the pitcher actually made a worse pitcher, it was just the stadium he was in at the time. The scale is the same as ERA.

DRA

Deserved run average tries to estimate how many runs a pitcher should be credited for allowing. It tries to remove every factor that isn’t what the pitcher did.

DRA is premised on the notion that while a pitcher is probably the player most responsible, on average, for what happens while he is on the mound, he is not responsible for everything. DRA therefore only assigns the runs a pitcher most likely deserved to be charged with.

According to Baseball Prospectus, it is the best estimator available to the public because it exceeds the performance of stats that try to do the same thing, like ERA.

DRA explained about 70 percent of pitcher runs allowed in each full season, even including pitchers with as few as one batter faced.

It is on the same scale as ERA, so the lower the number is, the better.

DRS

Defensive runs saved is a stat that attempts to measure how many runs a player saves or costs his team while in the field. It’s measured on a scale where zero is an average defender, that didn’t cost or save any runs, anything above zero means the fielder saved that many runs, and anything below zero means the player cost his team that many runs.

Framing (Framing Runs)

Framing attempts to calculate how many extra strikes a catcher gets for his pitcher. The concept goes back to baseball’s beginnings but without pitch tracking data, it was next to impossible to calculate. It is calculated by finding the extra strikes a catcher gets, which is “the difference between actual and predicted strikes received by the catcher,” according to Baseball Prospectus.

Framing Runs was created by Baseball Prospectus to calculate how many runs a catcher is saving, or costing his team, with his ability. It is on the same scale as DRS, where zero is average. A framing leaderboard can be found here.

Statcast Data

Exit Velo and Launch Angle

Exit velocity is simply how hard the batter hits the ball. Launch Angle gives a specific description of if the ball was a line drive, ground ball, or fly ball. A ball hit with an exit velo of 95+ mph is considered a hard hit ball.

Barrels

Barrels are any ball that is hit at 98 mph or harder with a launch angle of 26-30 degrees. The launch angle range expands for every MPH over 98.

 The Barrel classification is assigned to batted-ball events whose comparable hit types (in terms of exit velocity and launch angle) have led to a minimum .500 batting average and 1.500 slugging percentage since Statcast was implemented Major League wide in 2015.

But similar to how Quality Starts have generally yielded a mean ERA much lower than the baseline of 4.50, the average Barrel has produced a batting mark and a slugging percentage significantly higher than .500 and 1.500, respectively. During the 2016 regular season, balls assigned the Barreled classification had a batting average of .822 and a 2.386 slugging percentage.

Spin Rate

Spin rate is how many times a pitch rotates, which creates break or the appearance the pitch is rising. It is classified with RPM. Higher spin rates mean the ball stays flat longer, lower spin rates mean the ball breaks more. Generally, high spin rates lead to strikeouts and low spin rates lead to ground balls. An off-speed pitch with a high spin rate will move more than one with a low spin rate.

Sprint Speed

How fast a player runs at their top speed, measured by feet per second. It only includes sprints to first on “competitive runs,” meaning things, like jogging into second or jogging out a ground ball, are n0t included.

The Major League average on a competitive play is 27 ft/sec, and the competitive range is about 23 ft/sec to 30 ft/sec.

Pop Time

How fast the catcher gets the ball to second or third base when trying to catch a runner. It includes how fast he gets the ball from his glove to his hand (exchange) and his arm strength.

The Major League average Pop Time on steal attempts of second base is 2.01 seconds. Average times are calculated with the following ranges.
Pop Time to 2B: 1.6 sec to 2.5 sec
Pop Time to 3B: 1.2 sec to 2.5 sec
Exchange: .4 sec to 1.3 sec

Catch Probability and OAA

Catch Probability calculates the percent chance an outfielder would catch the ball. It is used to help measure outfield defense. MLB has a really good breakdown of it here:

Catch Probability represents the likelihood that a batted ball to the outfield will be caught, based on four important pieces of information tracked by Statcast. 1. How far did the fielder have to go? 2. How much time did he have to get there. 3. What direction did he need to go in? 4. Was proximity to the wall a factor?

It is broken into tiers of five percent probabilities; so 0 percent, 5 percent, 10 percent, and so on, up to 100 percent.

Outs Above Average uses catch probability to find how many extra balls an outfielder gets to, or doesn’t get to over the season. It is fairly simple to calculate once you know the catch probability. The formula for caught balls is 1.00 – catch probability = X. So if an outfielder catches a ball with a 25 percent catch probability, he adds .75 to his OAA total. If a player doesn’t make the play, you just subtract the catch probability. So if the catch probability is 65 percent, the player loses .65 points from his total. It is on a scale where zero is average, anything above zero is better, and anything below zero is worse.

Wins Above Replacement

Wins Above Replacement, or WAR, is one of the most talked about stats and it is also one of the most controversial. What you should really take away from WAR is it’s not perfect, but it is a good estimator.

WAR attempts to calculate a player’s total value added over a league average player, also known as a replacement player. Part of that is confusing to people because they think a replacement player means whoever is called up to replace him, but it really just means a player who doesn’t hurt or help the team.

A player who puts a zero war is league average, and every number over it is one win added. That also doesn’t mean a 5 win player is going to add five more wins to the team, it’s just an estimation. It is also important to know that since it’s just an estimation, there probably isn’t a big difference between a 5.4 win player and a 5.1 win player.

The scale is important to know too, 0 is a replacement level player, 3 is a starting level player, 5 is an all-star level player, 7 is an MVP candidate level player, and 9+ is Mike Trout level.

There are also different calculations for WAR, since it is just an estimation. Baseball Reference WAR can look entirely different from Fangraphs WAR.

WAR shouldn’t be used as an end-all perfect stat. But it is useful for getting a general idea of how much value a player is providing to the team.

Resources

Fangraphs

Baseball Savant

Baseball Prospectus

Baseball Reference

MLB.com’s Glossary

Written by Blake Williams

I graduated with an Associate's Degree in Journalism from Los Angeles Pierce College and now I'm working towards my Bachelor's at Cal State University, Northridge. I'm currently the managing editor for the Roundup News and a writer for Dodgers Nation. Around the age of 12, I fell in love with baseball and in high school, I realized my best path to working in baseball was as a writer, so that's the path I followed. I also like to bring an analytics viewpoint to my work and I'm always willing to help someone understand them since so many people have done the same for me. Thanks for reading!

11 Comments

Leave a Reply
  1. Top notch article. JMHO, but I believe pitch framing stats are so subjective, influenced by pitchers, umpires, pitch location relative to a strike zone that changes by game (sometimes within the same game) that they should only be used in ranges similar to fangraphs WAR ranges.

    • Thanks for reading. And yes that is an interesting point. I agree they aren’t perfect but I never thought to consider looking it on similar ranges like WAR. I think that’s a really good way of looking at it.

  2. Wow, that is a lot to digest for an old school stat head like myself. I tend to measure a pitcher by looking at both era and whip. But one thing I always wonder about is the runners allowed (by stater) to score by the bullpen and effect on the starting pitcher’s era. Let’s say Kershaw is lights out for 7 2/3, then allows an infield hit and a bloop single and due to pitches thrown he is removed from game. I will use Josh Fields since he is gone. Fields comes in and allows a HR and Kershaw, and Kershaw is charged with two earned runs.

    Same thing can happen to a relief pitcher. Maybe he strikes out first two batters., then allows a couple of bloop hits and his replacement comes in and allows the runs to score. If this happens often, a relief pitcher can actually be way more effective than his era portrays. Or conversely less effective if he is always allowing runners to score but gets out of the inning before his runners allowed score. For this reason, I think WHIP is better than ERA for relief pitchers.

    Which of the (new stats) best covers pitcher effectiveness disallowing for runners who score due to another pitchers ineffectiveness?

  3. Blake thank you for the explanations on stats and the time you put in. I agree with GLP lot of information to wrap my brain around. I have always used ERA and WHIP to give me a quick evaluation of a pitcher,

    I had always thought to truly evaluate a hitter you should look at their RISP% as that is when the pitcher is bearing down and the batter is supposed to get the RBI. Last few years been reading about stats and used wRC+ as a better number for evaluation.

    I recently read about a new stat used by Baseball Prospectus that combines with the wRC+ stat and looks like it may be more inclusive in DRC+ as it attempts to factor in negative results like strikeouts and hitting into double plays what do you think of it?

    Thanks again for this great explanation. Baseball Operations people are using these stats to evaluate players and make trades. If we want to understand the reasons they are making decisions it helps to look at the stats they are using to make those decisions.

    • Hi, thanks for reading. Regarding DRC+, I think the idea behind it is really good but it probably needs some more tweaks and I’d like to see how it is over a few seasons. It’s trying to quantify a lot of scenarios which is really tough to do.

      • Thanks Blake for the follow up explanation on that.

        Striking out when the correct Baseball and Team play is somehow to move the runner makes me crazy.

        When I played in HS and CC moving the runner was the top priority. Even if you have to shorten up and slap the damn ball get the man over help your team. With the shifts today it should be very easy to slap the ball to that gap. We used to slap the ball to make sure we made contact and moved the man.

        I remember guys getting fined in the MLB many years ago for not making contact in that situation.

        And it’s not just the Dodgers the Dbacks failed to score a man on third with No Outs during yesterday’s game and ended up losing the game by that lone run.

        So that is why I like the idea of DRC+ as it attempts to inject some of that into the numbers.

        I was wrong so far on Pederson he is having professional at bats. Barnes and Bellinger have looked great also. It is a very small sample size but very encouraging. AJ Pollock has been a difference maker.

        Taylor is a strikeout waiting to happen. It was reported that Taylor and the hitting coaches had figured out the problem was he had his bat in the wrong position and corrected it but it does not look like it helped.

  4. It’s about playing as a team. That’s why great players don’t equate to Championships. It’s about winning, not what should win. Roberts is figuring that out.

  5. Very good and informative article Blake. Thanks for all the “new math” info. I’m “old school” but have always known that analytics plays an important role and has been around since the beginning of the game, albeit on a much more reduced scale. I do see the value of using these analytics but I’m still not convinced that they give any valuable meaning to the players “mental approach” to the game which bears influence on his own stats when analytics are employed to great lengths. I haven’t seen analytics provide true and useful info re how players perform positively or negatively as a result of never knowing if they will start on any given day, or if they are platooned at the last moment for a specific reason, etc. How do analytics measure a players mental attitude in a given situation? Mental readiness is as important as anything to maximize effectiveness. I believe analytics can diminish a great player to an average level player because players eventually say to themselves “why fret about a given situation when management will likely abruptly remove me from the lineup for one reason or another”. I would certainly be interested in the player mentality with his unwitting subjection to constant analytics. Sometimes we tend to over analyze everything. Maybe the game of baseball is headed for the day when computers put together the lineup card and make all game decisions in an instant. Why the need for a manager? Just my two cents.

    • Thanks for reading. The mental side is something that will always be discussed. It isn’t possible to measure but it is definitely a factor. I think the key thing to remember is analytics are just trying to show how the player actually performed and how to put the player in the best position to succeed. The biggest part of managing the mental side with the statistical side is communication. Players can’t be ready if they don’t understand why or what the team is doing. But I do think players are starting to realize all the information is just there to help them and their team and more will buy-in to them.

  6. Terrific article, Blake! You did a fantastic job at explaining simple and advanced stats in ways that both novice and advanced fans can understand. I run a Phillies blog, so I understand how difficult that can be. What helped me a lot were your descriptions for w, x, +, and -. Most people already know what that means, but to have it written out somehow makes it easier for my brain to understand.

    It’s also obvious you did your homework to find the best descriptions from trusted sources.

    I’ve been lazy about really learning these terms for a while, and your article inspired me to finally become informed.

    Only problem for me is that the results suggest the Dodgers know a hell of a lot more about analytics than the Phillies do….

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Dodgers News: Kiké Hernandez Comments On Starting Role

Dodgers: Hyun-Jin Ryu’s Greatest Hits