Your Team’s Prospects Are Probably Not Going to Work Out

Serious prospect hounds know that only about 10% of minor leaguers ever participate in a Major League game in their career. However, even the most discerning fans can be deluded into believing that their team’s farm system can overcome the odds and build a perennial contender based on their farm system alone.

I decided to investigate how much average WAR a prospect generates based on their ranking in Baseball America’s Prosect Handbook. I used a similar process in a previous article where I calculated the amount of WAR based on the next six seasons of a player’s career since being listed instead of when a player makes their Major League debut. This means that players closer to the Majors get a boost to their value, since they will have more opportunities to accumulate WAR than players in the lower minors.

Next, I grouped the players by their ordinal ranking in their organization from the 2001 to 2015 seasons and calculated each group’s average WAR to create the visualization below.

That is a steep decline, but it is not unexpected. Most prospects that ascend to the top of their team’s list have flourished in the lower minors or have a higher pedigree than their minor league compatriots. Many top prospects are also perceived as being closer to big league ready. It makes sense that these types of players would produce more value given my methodology. Players that are ranked lower can still be successful in the Major Leagues, but the profusion of prospects that fail to make it to the Majors keeps these group’s average much lower than the higher ranked players in an organization.

This is a decent start, but it does not account for differences in an organization’s minor league depth. The fifth ranked prospect for a rebuilding team with plenty of depth is likely more talented than a fifth ranked prospect in a competing team’s depleted farm system. To account for the quality differences between farm systems, I created a heat map of average WAR produced with the player’s ranking in the organization on the y-axis and the team’s farm system ranking on the x-axis.

As expected, higher values are in the top half of the heat map. The highest values appearing in the top left-hand corner. If Baseball America’s rankings are an accurate representation of minor league talent, the highest ranked players in the most talented farm systems should produce the most value.

Average WAR is a reasonable place to start. However, with so many prospects in my dataset failing to reach the big leagues, the distribution of player WAR value is heavily skewed to the right. When dealing with skewed datasets, it is more appropriate to use median values instead of average values. This is because outliers can heavily influence your results and the median calculation helps to mitigate the effects that outliers can have on your dataset. The next heat map was created the same way as the previous one, but with each group’s median WAR instead of average WAR.

Woof. This chart is much bluer than the previous one, but the pattern is similar with the top of the chart producing the most value. This graphic shows a more desolate view on prospect valuation, but front offices and fans do not necessarily care about summary statistics. They care about their individual player and how he will do in the future. It is nice to know the odds of success from the past, but it is not necessarily predictive of a player’s future. Players outperform baseball industry projections all the time. Who is to say that your team does not have a diamond in the rough? The next heat map attempts to provide a realistic showing of a best-case scenario for each prospect ranking. Each cell is the maximum amount of WAR produced in their respective cohort.

This graphic shows why GMs are reluctant to trade away their prospects. The general trend remains the same, but there are far more yellow and green boxes dispersed throughout the chart. Nobody wants to be known as someone who trades away a young star for three months of a role player. This heat map shows that Major League contributors can come from almost anywhere.

I am interested in players that the industry has overlooked. I believe it would be beneficial to identify these types of players to see if there is a possible blind spot in prospect valuation. The first thing I decided to do was limit the dataset to include only players that were ranked eleventh in their organization or lower. I could have drawn the line anywhere in the top 30, but the line chart from earlier seems to start to level off around this point and ten seems like a logical cutoff point.

The next step was to determine how much accumulated WAR is considered a success. I landed on 10 WAR being considered as a success. This cutoff is arbitrary, but I wanted to pick a lower cutoff value to accommodate for players that are in the lower levels of the minors and several years away from the Majors. These younger players may be as talented as older players but since I am not adjusting for a player’s team-controlled seasons, inexperienced players do not have the same opportunities to generate as much value as their teammates who are closer to the Majors. By keeping my threshold low, I should be able to mitigate some of the bias in my dataset.

There were 97 players who met my criteria and 127 occurrences of players ranked eleventh or lower that produced over 10 WAR in six seasons. With 25 players doing it twice, Denard Span doing it three times and Josh Donaldson doing it four times! Below is a summary table of each occurrence with an accompanying bar chart.

YearPlayerPositionAmateur TypeTeam RankRank in OrganizationHighest Level PlayedWAR
2001A.J. PierzynskiPosition PlayerHS1518MLB11.4
2001Aaron HarangPitcher4Yr1127A+12.2
2002Aaron HarangPitcher4Yr1916AA17.0
2011Adam EatonPosition Player4Yr2230Rk14.6
2012Adam EatonPosition Player4Yr412AA15.1
2009Alex AvilaPosition Player4Yr2820A10.8
2005Andre EthierPosition Player4Yr816A+11.2
2011Andrelton SimmonsPosition PlayerJC215Rk12.8
2007Asdrubal CabreraPosition PlayerINTL1015AAA12.2
2007Austin JacksonPosition PlayerHS718A11.7
2005Ben ZobristPosition Player4Yr2216A-12.1
2006Ben ZobristPosition Player4Yr2016A+18.5
2003Bill HallPosition PlayerHS1618MLB10.7
2014Blake SnellPitcherHS2014A11.1
2001Brandon WebbPitcher4Yr2927A16.8
2002Brandon WebbPitcher4Yr2326A+22.4
2006Brett GardnerPosition Player4Yr1713A-14.7
2009Brett GardnerPosition Player4Yr1513MLB20.6
2011Brian DozierPosition Player4Yr1330A+16.4
2001Brian LawrencePitcher4Yr811AAA11.3
2003Brian McCannPosition PlayerHS228Rk14.9
2006C.J. WilsonPitcher4Yr1614MLB11.2
2007Carlos RuizPosition PlayerINTL2113MLB14.4
2012Charlie BlackmonPosition Player4Yr1611MLB16.9
2004Chien-Ming WangPitcherINTL2712AA10.4
2003Chone FigginsPosition PlayerHS528MLB15.8
2004Chris YoungPitcher4Yr3019AA12.1
2014Cody BellingerPosition PlayerHS1414Rk15.4
2015Cody BellingerPosition PlayerHS320Rk16.6
2013Collin McHughPitcher4Yr2624MLB11.4
2013Corey DickersonPosition PlayerJC2013AA10.5
2011Corey KluberPitcher4Yr726AAA21.4
2002Covelli CrispPosition PlayerJC3018A+15.2
2003Covelli CrispPosition PlayerJC126MLB16.8
2003Curtis GrandersonPosition Player4Yr1218A-17.2
2011Dallas KeuchelPitcher4Yr2623AA12.0
2012Dallas KeuchelPitcher4Yr1721AAA14.2
2006Dan UgglaPosition Player4Yr229AA20.3
2003David BushPitcher4Yr614A+10.1
2003David DeJesusPosition Player4Yr2619AA13.6
2013Dellin BetancesPitcherHS1119MLB11.2
2014Dellin BetancesPitcherHS1826MLB11.3
2005Denard SpanPosition PlayerHS414A10.0
2007Denard SpanPosition PlayerHS813AA15.1
2008Denard SpanPosition PlayerHS1820AAA18.3
2002Dontrelle WillisPitcherHS121A-20.8
2001Erik BedardPitcherJC2719A10.5
2005Freddy SanchezPosition Player4Yr1813MLB14.9
2006Geovany SotoPosition PlayerHS1516MLB12.6
2007Geovany SotoPosition PlayerHS1817MLB13.8
2015German MarquezPitcherINTL1725A13.3
2008Ian DesmondPosition PlayerHS914AA10.7
2009Ian DesmondPosition PlayerHS2119AA14.8
2013Jake deGromPitcher4Yr2611A+26.2
2006Jamie ShieldsPitcherHS1012AAA20.3
2003Jason BayPosition Player4Yr2012AA15.9
2004Jayson WerthPosition PlayerHS817MLB16.6
2008Jonathan LucroyPosition Player4Yr2116Rk22.4
2004Jonathan PapelbonPitcher4Yr2314A-10.2
2011Jose AltuvePosition PlayerINTL2628A+19.0
2013Jose RamirezPosition PlayerINTL2423A21.0
2009Josh DonaldsonPosition Player4Yr313A+13.9
2010Josh DonaldsonPosition Player4Yr1214AA22.6
2011Josh DonaldsonPosition Player4Yr2812MLB30.5
2012Josh DonaldsonPosition Player4Yr2620MLB35.6
2007Josh HamiltonPosition PlayerHS1230AA25.2
2004Josh JohnsonPitcherHS1424A10.8
2005Josh JohnsonPitcherHS1411A+16.4
2006Josh WillinghamPosition Player4Yr211MLB13.3
2007Justin MastersonPitcher4Yr913A-12.2
2010Kenley JansenPitcherINTL2414AAA10.9
2014Ketel MartePosition PlayerINTL2520A+11.6
2011Kevin KiermaierPosition PlayerJC326Rk11.1
2013Kevin PillarPosition Player4Yr1221A+10.3
2014Kevin PillarPosition Player4Yr1520MLB11.9
2013Khris DavisPosition Player4Yr2216AAA11.5
2012Kole CalhounPosition Player4Yr1820A+13.1
2013Kole CalhounPosition Player4Yr3011MLB13.1
2014Kyle HendricksPitcher4Yr411AAA18.0
2010Kyle SeagerPosition Player4Yr1130A+17.6
2015Lance McCullersPitcherHS1011A+10.7
2005Luke ScottPosition Player4Yr2217AA11.1
2006Luke ScottPosition Player4Yr2015MLB11.3
2001Mark EllisPosition Player4Yr1117AA10.7
2003Mark HendricksonPitcher4Yr613MLB10.5
2003Matt CainPitcherHS1111Rk11.8
2011Matt CarpenterPosition Player4Yr2411AA20.5
2012Matt CarpenterPosition Player4Yr1212MLB23.8
2002Matt HollidayPosition PlayerHS2411A+14.3
2003Matt HollidayPosition PlayerHS2516AA20.2
2015Max KeplerPosition PlayerINTL212A+10.8
2014Mike ClevingerPitcherJC3017A10.8
2015Mike ClevingerPitcherJC2322A+11.5
2005Mike NapoliPosition PlayerHS129A+10.0
2006Mike NapoliPosition PlayerHS311AA14.3
2008Mike StantonPosition PlayerHS1411A-14.7
2001Morgan EnsbergPosition Player4Yr1015MLB15.4
2010Neil WalkerPosition PlayerHS1626MLB15.9
2003Nick SwisherPosition Player4Yr2211A+11.2
2011Noah SyndergaardPitcherHS424Rk10.0
2012Odubel HerreraPosition PlayerINTL227A10.4
2015Odubel HerreraPosition PlayerINTL2212AA10.9
2006Pablo SandovalPosition PlayerINTL1815A-12.6
2010Paul GoldschmidtPosition Player4Yr2713Rk20.9
2011Paul GoldschmidtPosition Player4Yr2211A+25.8
2002Rich HardenPitcherJC1921A-10.2
2005Ricky NolascoPitcherHS1019AAA11.9
2013Robbie RayPitcherHS1618A+10.2
2015Robbie RayPitcherHS611MLB11.8
2004Russell MartinPosition PlayerJC218A18.9
2009Ryan HaniganPosition Player4Yr1416MLB16.5
2002Ryan HowardPosition Player4Yr1115A-11.6
2004Scott BakerPitcher4Yr519A10.7
2005Shane VictorinoPosition PlayerHS2019MLB16.4
2006Shane VictorinoPosition PlayerHS2214MLB21.9
2001Ted LillyPitcherJC718MLB10.4
2014Tommy PhamPosition PlayerHS723AAA15.6
2015Tommy PhamPosition PlayerHS1515MLB15.5
2001Travis HafnerPosition PlayerJC1312A+16.1
2002Travis HafnerPosition PlayerJC816AA18.3
2013Travis ShawPosition Player4Yr623AA10.0
2015Trevor StoryPosition PlayerHS812AA17.9
2011Tyler FlowersPosition PlayerJC2717MLB11.0
2008Will VenablePosition Player4Yr1215AA11.5
2013Yan GomesPosition Player4Yr2427MLB13.6
2005Yovani GallardoPitcherHS316A11.5
2004Zach DukePitcherHS1115A12.6

The first thing that stands out is that there are far more position players than pitchers. I do not know for certain why this is the case, but I have several theories. The first is that pitchers are fragile by nature and they are more likely to be injured and unable to generate as much WAR as position players. Another theory is that it may be easier to scout pitchers than position players and that the more successful pitchers are ranked higher and are excluded from my dataset.

Another observation is that almost half of the players attended college and there are relatively few international players that achieve stardom. Of the 97 players in my dataset, forty-five of them attended a four-year university and there are only eleven international players. However, his does not necessarily mean that college players are more likely to exceed their prospect ranking.

From 2001 through 2015, over 40% of players that were ranked between eleven and thirty in their organization attended college. This is the most prevalent type of amateur experience by far. It stands to reason that they would have more people accrue 10 or more WAR.

It appears that international players are less likely to reach 10 WAR. There is roughly the same amount of ranked high school and international players, but there are 30 high schoolers that reached 10 WAR compared to only 11 international players. I do not really have a good explanation for this phenomenon, but I do find it interesting.

The final observation I have is from the position player bar chart. I find it interesting that players with AAA as their highest level are the least represented group. The only explanation I can offer is that perhaps players that reach AAA but not the Majors are perceived as having lower ceilings and their teams decided they were not worthy of being called up in September when rosters expanded. This could artificially lower the AAA group and increase the size of the MLB group. If rosters did not expand in September, I think that there would be far more players in the AAA group and fewer in the MLB group.

Conclusions

  • Most prospect value is concentrated in the top of a team’s farm system, but value can come from any ranking position.
  • Position players are more likely to outperform their ranking than pitchers.
  • College position players are the most prevalent lower ranked prospects to accrue 10 or more WAR.
  • International players are the least likely to generate 10 or more WAR.

Evaluating prospects is a difficult endeavor and I hope that this article helps to illuminate the types of players that are typically overlooked by prospect evaluators.

Click here for GitHub code

An Examination of Rebuilding Team Timelines

Rebuilding has become the popular way for MLB franchises to build a perennial World Series contender. With the league’s structure of compensating the worst teams with the best draft picks, it seems like a viable strategy to maximize your losses in order to obtain the services of the best amateur talent available. The Astros and Cubs are two of the more recent franchises to successfully cap their extensive rebuilding process with a World Series victory. Both franchises acquired top ten draft picks for several  years before they turned the corner and became World Series contenders, but how often does this strategy work and how long does a rebuild take?

If an organization’s strategy is to not win games right away, when do the fans and ownership realize that the rebuilding process has failed and that their team is in the middle of a downward spiral of ineptitude? I am sure there are fans of the Pittsburgh Pirates and Kansas City Royals from the 1990s and 2000s that know how difficult it is to build a contender and cringe whenever they hear the term rebuild. Hopefully, this article can provide a reasonable timeline for contention and an objective overview on how a franchise’s rebuilding effort should be progressing.

For my dataset, I gathered the GM or President of Baseball Operations for each organization since 1998. I chose 1998 because it was the first year the league consisted of 30 teams and it also happened to be the first full season for the current longest tenured executives Billy Beane and Brian Cashman. If an executive’s tenure with the team started before the 1998 season, their entire tenure was included in the dataset. So, the Braves GM John Schuerholz’s regime is measured in its entirety from 1991-2007 and not just from 1998-2007.

For executives that took over an organization during the regular season, I credited the team’s record from the executive’s first full season of running baseball operations and not the partial season they assumed their duties. For example, the 2002 Detroit Tigers record goes on Randy Smith’s ledger instead of Dave Dombrowski even though Dombrowski took over one week into the regular season.

To determine which front offices inherited a rebuiling situation, I limited the dataset to teams that had a winning percentage lower than .432 the year before the new executive assumed leadership. I chose a .432 winning percentage, because this corresponds to a team that failed to win 70 games in a 162 game season. Most organizations that lose this many games in a season realize that they likely have a long time before contention and that a rebuild is necessary. By choosing such a low win threshold,  I eliminate several rebuilds like the Theo Epstein led Cubs, but I’d rather exclude several rebuilds than include 70 win teams that just had an off year and returned to prominence using the same core players.

I decided to include the new expansion team front offices, since they are starting a new organization from scratch I feel they firmly belong in the rebuilding category. This leaves me with 40 different rebuilds to analyze and nine of them are current regimes. Below is a summary of each rebuild with the current front office leaders highlighted.

Seven different regimes won at least one World Series and 23 of the 40 front offices had at least one postseason appearance. The longest tenure without a playoff appearance is a tie for eight seasons between the Chuck LaMar led Tampa Bay Devil Rays and the Ed Wade led Philadelphia Phillies.

Next, I wanted to inspect how each administration’s winning percentage progressed by season. I created a boxplot of each team’s winning percentage by the executive’s season in the organization. I decided to stop the x-axis at eight seasons, because that is the longest time that an executive kept his job without making the postseason. It is also the first season where over 50% of the front offices had been replaced, so it seemed like a reasonable stopping point. I also added a summary table of when each front office accomplished certain goals by season.

The median for winning percentage increases each season. There are likely two reasons for this event. The first is that any team that initiates a rebuild is at or near the nadir of their suffering and there is nowhere to go but up. The second is that there is a survivor bias. A front office that is underperforming expectations is more likely to be replaced and those teams do not appear in the boxplot to drag down the winning percentage calculation.

I find it interesting that the median for season four performance is exactly .500. This appears to be the make-or-break season for if a rebuild is deemed a failure or a success. Out of the 23 administrations that eventually made the playoffs, only five made the postseason for the first time after their fourth full season running baseball operations. However, it is worth mentioning that two of these regimes did eventually win the World Series. The Dave Dombrowski led Florida Marlins won in his fifth full season at the helm and the Dayton Moore led Kansas City Royals won the Pennant in his eight full season and the World Series in his ninth full season. These two cases provide hope for slower developing rebuilds, but most rebuilds can be considered a failure after four seasons if there has been no significant progress at the Major League level.

Of the nine current administrations that inherited a rebuilding team, there are only two administrations that have not made the postseason. The Pittsburgh Pirates and the Baltimore Orioles. They also happen to be the only two teams that have not reached the critical fourth season. Only time will tell if these rebuilds will be considered a success, but they both still have a long way to go, and the clock is ticking.

Click here for GitHub code.

Is Consistent Contact More Important than Raw Power?

There are many types of hitters that have had success in baseball history. There are hitters with light-tower power like Giancarlo Stanton and players with exceptional bat control like Tony Gwynn or Ichiro Suzuki. Not many people can hit the ball as hard as Giancarlo Stanton, so many coaches and advisors instruct their players to focus on hitting the ball hard consistently instead of maximizing their power output. Is this good advice? Is it possible that consistent hard contact can overcome a player’s power deficiency?

To answer this question, I collected batted ball data from the 2019 and 2020 seasons from Baseball Savant and found each batter’s maximum exit velocity. I am working under the assumption that a player’s maximum exit velocity is a suitable facsimile for raw power. I then tabulated how many times a player’s exit velocity was within 90% of his maximum exit velocity and divided it by the batted balls put in play that registered an exit velocity reading. I call this new stat exit velocity efficiency. I chose the 90% threshold, because the lowest maximum exit velocity readings in the Majors for qualified hitters is around 100 MPH and I believe that 90 MPH is the lowest reasonable threshold for what qualifies as a hard-hit baseball. Next, I limited the dataset to include players that have had over 100 batted ball events combined between the 2019 and 2020 seasons. This left me with 449 Major Leaguers to analyze.

The first thing I wanted to do was compare a player’s exit velocity efficiency and his maximum exit velocity to a performance metric to help determine which statistic has a stronger relationship with hitting performance. I believe that wOBACON is the best option for a performance metric, because it focuses solely on balls in play. I am choosing to ignore other factors like plate discipline and contact rate for simplicity, but these are important aspects of hitting that I will explore later.

These scatter plots show that there is a far stronger relationship between maximum exit velocity and wOBACON than there is between exit velocity efficiency and wOBACON. The former has a Pearson correlation coefficient of 0.55 and the latter has a 0.03 Pearson correlation coefficient indicating almost no relationship at all.

Why is the relationship so low for exit velocity efficiency? It is because players with exceptional raw power do not need to consistently hit balls over 90% of their maximum exit velocity to be effective. Someone who can hit the ball over 118 MPH, like Aaron Judge, have a far greater margin for error for their batted balls than someone like Billy Hamilton who has difficulty reaching 100 MPH. Since exit velocity efficiency is based on each player’s individual maximum exit velocity, it makes sense that players with different maximum exit velocities and similar exit velocity efficiencies would have vastly different batted ball results.

For example, Mookie Betts and Tony Kemp both have an exit velocity efficiency of 33%, but Mookie Betts has a maximum exit velocity of 109.3 MPH compared to Tony Kemp’s 101.4 MPH. Kemp’s wOBACON registers a .299 while Betts has a more robust .428. This is all just a long way of saying that exit velocity efficiency on its own does not have a strong correlation with batted ball success.

Even though maximum exit velocity has a strong relationship with wOBACON, there are still several players that do not possess elite power that are highly effective hitters and there are some hitters who have power but are not the most productive hitters. A great hitter needs to have both power and efficiency, but one usually comes at the expense of the other. What is the best ratio to maximize a player’s performance? Is there a way to identify and classify these players to help them realize what style of hitting works best for them?

K-Means Clustering

I decided to use a machine learning technique called k-means clustering to group the players by their maximum exit velocity and efficient exit velocity readings to determine which hitting approach produces the best results on average. I found that there are five distinct groups of hitters and each group’s mean performance can be found in the table below along with the cluster plot and a review of each group’s results.

Group 5 – Great Power, Below Average Efficiency

This is the most successful group on average and they represent players with the most power. They trade some exit velocity efficiency for power, but their power helps overcome their lack of efficiency in general. This group has the highest wOBACON, wOBA, and wRC+ on average, but a density plot will provide more information as to the distribution of talent. Since each group’s order of success is the same regardless of which offensive metric being chosen, I will be using wRC+ for each density plot moving forward. The metric is easier to interpret and it incorporates more aspects of hitting than batted ball outcomes alone.

There appears to be some risk in this group with many players performing below the mean value. It seems that the extraordinary hitters are propping up the group’s wRC+ average, but there are plenty of players that have issues making consistent contact dragging down their overall value. Players like Mike Zunino and Gregory Polanco have good wOBACON numbers, but their swing and miss issues are too much to overcome to be considered good hitters.

Group 2 – Good Power, Good Efficiency

This group represents the players that do not have top of the scale power but make up for it by being more efficient in their contact quality. The top performers in this cohort are players that can combine their contact abilities with exceptional plate discipline like Juan Soto and Freddie Freeman.

This group has a higher distribution of good hitters, but they do not have the top tier talents to raise the mean wRC+ that the previous group did. This group may not have as much upside, but the floor appears to be a bit higher than the Group 5 hitters.

Group 4 – Great Efficiency, Just Enough Power

Group 4 has a much lower maximum exit velocity than the previous two groups, but they are by far the most efficient group. The hitters that have success in this group are players like Alex Bregman and Anthony Rendon. Bregman and Rendon consistently elevate the ball and limit their swings and misses. Therefore, their profile is effective even though they have a lack of raw power.

This is the first distribution chart where the peak is below 100 wRC+ and it shows how difficult it is to be a great hitter without big raw power. It is still possible, but the margin for error is much finer.

Group 1 – Good Power, Below Average Efficiency

This group is the first to have a below average wRC+ collectively. They also may be the most frustrating. Group 1 is full of players that have comparable raw power to Group 2, but not Group 2’s efficiency.

Group 1 may be the most frustrating, but they are also the most tantalizing. Group 1 hitters have the raw tools necessary for success and it is possible that with more seasoning and a refined approach they could improve their efficiency and become more like the hitters in Group 2.

Group 3 – Good Efficiency, Not Enough Power

This final group is littered with players that rely more on their defense and versatility than their bat to provide value to their club. Players like catchers, middle infielders, and fourth outfielders. Group 3 has the lowest wRC+ and it seems that most of these players lack the physical strength needed to be considered an elite hitter.

This group may have the worst results overall, but all hope is not lost for this type of hitter. Jeff McNeil, Tim Anderson, and Marcus Semien have all recorded a WRC+ over 120 over the last two seasons, so it is possible to succeed with this profile.

Conclusion

My cluster analysis shows that it is possible to be a successful Major League hitter as a member of any group, but it is clearly beneficial to have the ability to hit the ball hard. Both raw power and efficiency are essential to hitters, but if I had to choose which is more important to a hitter’s success, I would choose raw power. Consistent contact is useful, but if a player’s raw power does not meet a certain threshold there is little chance that he will be a productive Major League hitter.

Click here for GitHub code.

Using Bayesian Models to Predict MLB Free Agent Salaries

Executive Summary

I created a linear regression model to predict how much salary a Major League player will make in free agency. This model will help determine which free agents are affordable and fit into the team’s yearly budget. I used a player’s WAR values from the previous three seasons, All-Star and MVP status, position, and contract length to predict his salary. I found that players that sign one-year deals or deals that are seven years or longer have lower intercept values than players that sign for two to six seasons. Pitcher’s also command higher salaries for the same performance compared to their position player counterparts.

Introduction

The purpose of this model is to predict the salary a Major League baseball player will earn in free agency. Major League teams work under a budget built on player salary and this model will help to identify which players can be signed to improve the team’s chances of winning without going over their yearly budget. I will create a model that uses on-field performance as well as career awards and honors to predict the player’s salary.

Data

The data were collected from ESPN’s MLB Free Agent Tracker, FanGraphs.com and Baseball-Reference.com and it includes any player that signed a Major League contract from the 2006 to 2019 off-season. The model will be predicting the dollar amount the player signed for in present value divided by the length of the contract. Major League Baseball salaries have continued to increase every year and I believe that it would be easier to model the present value salary instead of introducing another variable into the model to control for the year the player signed his contract. I decided to use a 3% inflation rate and it seems to be a good facsimile for annual salary in 2020 terms.

I obtained the WAR values from the player’s previous season and the two years prior from both FanGraphs and Baseball-Reference and averaged them together to create two new columns called MixedWAR_1 and MixedWAR_2. WAR is a comprehensive metric used to gauge the overall on-field value a player provided to his team. FanGraphs and Baseball-Reference have similar methodologies, but there are slight differences in the way the stat is calculated that can create a discrepancy for the overall value a player generated for his team. Therefore, I decided to average the two values together into one column. I decided to separate the most recent season from the other two seasons, because the most recent season is more predictive of a player’s future performance than what a player produced three seasons ago. This could be due to either injury, aging or a change in skill level from one season to the next.

The next thing I wanted to account for other than on-field performance was a player’s perceived value. A team will sometimes pay a player more for their past accomplishments and perceived upside than a player who has produced similar value over the last several years.  I attempted to simulate this effect with two categorical variables called MVP_Candidate and All_Star. If a player ever played in an All-Star game or received an MVP vote, they were counted as a yes in these categories.

The last variable I decided to account for was if the player was a hitter or a pitcher. These two positions have different jobs, and it is quite likely that they are compensated differently. Pitchers are more prone to injury, but they are also in higher demand, because they are easier to find playing time for than position players.

There were not too many challenges in collecting the data for this project since Major League Baseball is skilled at checking their data. The only thing I needed to do was omit any player that had signed to play in a different league or anyone that played in a different league but had not played in the Majors in the previous season. I excluded these players, because I do not think it would be fair to include a player’s statistics from another league as Major League statistics or to completely ignore their on-field production in a less competitive league and give them a zero for the previous year’s WAR value. If a team wants to sign a player that is coming from a different country’s league, this model will not apply to them.

I plan on using a linear regression model, but first I need to see what distribution would be appropriate for my predicted values. Below is the histogram of a player’s present value salary.

Clearly the data is right skewed, so using a standard normal distribution would not be an appropriate choice. Perhaps transforming the values using the logarithmic function will give a more normal distribution.

This is much better. Going forward I will be using this transformed column as my dependent variable.

Model

In the first model I decided to use the same non-informative prior that we used in the class example with all betas following a standard normal distribution (0, 1000000) and a precision that follows a gamma distribution (2.5,25) and a likelihood function of

Y­­i = β1 + β2*MixedWAR_1i + β3*MixedWAR_2i + β4*MVP_Candidatei + β5*All_Stari + β6*Pitcheri

that follows a standard normal distribution. I chose non-informative priors, because I did not have any preconceived notions of what the distribution should look like.

This model should be appropriate now that I have transformed my response variable to better resemble a standard normal curve and the WAR parameters approximately follow a standard normal curve and the other variables in the model are explanatory variables. Each of these parameters should help to identify how much of a raise to expect in salary when WAR is increased as well as how much value making an All-Star game or appearing on the MVP ballot is worth. This model accounts for on-field performance as well as career accomplishments and should be able to reasonably predict a player’s salary for the upcoming season.

All the parameters in the model converge and there is minimal autocorrelation. The residual plot looks random and the normal QQ plot looks like a reasonably straight line, although the model does seem to have a little trouble with extreme outliers. The model does seem to have trouble overestimating the salary of high performing players. This may be because many top performers take longer term deals that artificially lower their salary but guarantees more money overall. The baseball industry usually makes these deals because it lowers the player’s salary, and this allows the signing team to have more flexibility to stay under the luxury tax threshold while the player gets greater security even when his performance starts to suffer. The DIC for the first model is 2461.

I will attempt to account for these longer-term deals by creating a hierarchical model that creates groups based on contract length. Group 1 will be for one-year deals, Group 2 will be for two-year deals, Group 3 will be for three to four-year deals, Group 4 will be for five to six-year deals and Group 5 will be for deals that are seven years or longer. All the parameters in the model converge and there is minimal autocorrelation. The residual plot looks random and the normal QQ plot looks like a reasonably straight line, however the model does seem to still have trouble with extreme outliers, but the DIC has decreased to 2219 which means that the newer model is superior.

Results

The mean coefficients for the model are as follows:

Intercept for 1-year deals = .45

Intercept for 2-year deals = .90

Intercept for 3-4-year deals = 1.12

Intercept for 5-6-year deals = .94

Intercept for 7 or more-year deals = .35

Coefficient for previous season’s WAR = .20

Coefficient for previous two season’s WAR = .09

Coefficient for MVP candidate = .03

Coefficient for All-Star appearance = .13

Coefficient for being a pitcher = .23

These coefficients show that pitchers receive a sizable bump in salary compared to position players, and surprisingly that an All-Star appearance is worth more than being an MVP candidate. The model also shows that length of the deal has a huge impact on a player’s salary. As expected, longer term deals have a lower coefficient, but one-year deals are quite low as well. This is probably because many players that secure one-year deals are bench players that do not command as much salary and teams are not willing to commit to bench players for multiple years and this could be the reason why the intercept is so low. This probably means that the model will underestimate prominent older players who sign one-year deals not due to a decrease in performance, but because they are close to retirement and do not desire a long-term deal. If I could improve the model, I would probably exclude any deal over seven years to try and eliminate some of the skew in the response variable.

Click here for GitHub code.

How Much Value is Really in the Farm System?

Everyone knows that a strong farm system is the key to the long-term success of a Major League organization. What team would not want all-star players at below market rates? These players make it possible for organizations to field competitive teams and stay beneath the luxury tax threshold, but how much value can an organization expect from their farm system? How much more value do the best farm systems generate compared to the worst farm systems? These are some of the questions I attempted to answer with this article.

Methodology

The first thing I did was gather the player information and rankings from the Baseball America’s Prospect Handbooks taking place from 2001 to 2014 and entered them into a database. I then found the player’s total WAR (from FanGraphs.com) produced over the next six seasons and added them together to find the value the farm system produced. I chose six seasons to ensure that no team would get credit for a player’s non-team-controlled seasons, since the value produced in the following seasons would not be guaranteed for the player’s current organization. This method will reduce the total value produced by players that are further away from the Majors, but the purpose of this analysis is to focus on the value of the entire farm system and not an individual player’s value over the course of their career.

I believe that an example will help to understand my methodology, so let us look at the 2014 Minnesota Twins farm system. Below is a list of the thirty players that were ranked and the amount of WAR that each player produced by season.




In this table, we can see that the 2014 Minnesota Twins farm system produced 70.1 WAR over six seasons with 26.9 WAR coming in 2019. I repeated this process for every team to create my dataset.

Total WAR Produced by Farm System

The first thing I wanted to examine was how much total WAR an organization could expect from their farm system. To do this, I calculated the mean of WAR produced for each farm system and found that the average was 45.83 WAR produced over six seasons. I also discovered that the maximum WAR for a farm system was produced by the 2003 Cleveland Indians with a total of 136.0 WAR and the minimum value came from the 2008 Seattle Mariners with a total of -1.7 WAR.

The next thing I wanted to examine was the distribution of WAR values. Do they follow a standard normal distribution or is something else going on? Below is a density plot of all farm system’s WAR over six seasons.




There is nothing too crazy going on here, but it looks like the distribution is positively skewed with the 2003 Cleveland Indians as an outlier with 23.6 WAR more than the next highest organization. I am not surprised that the data is skewed, because if a player is doing well, he will accumulate more playing time and WAR. However, if a player is not performing well, he is in danger of being sent down to the Minor Leagues. This makes it difficult for the left tail to mirror the right tail distribution.

WAR Produced by Farm System in a Single Season

The amount of WAR generated over six seasons is a good way to show the overall production and general well-being of an organization’s farm system, but I believe it is just as important to see how much WAR an organization can generate in a given season. Producing 45.0 WAR over six years is the average, but how did you distribute them? If your farm system does not produce any WAR for the first five years, but produces 45.0 WAR in year six is that better than producing 7.5 WAR in six consecutive seasons? If you are the GM of a 100-loss club, you may prefer the 45.0 WAR in a single season, since 7.5 WAR in a season will probably not get you into the postseason. If you are the GM of a contending team, you may want the 7.5 WAR a year instead, since your team is already competing for a playoff spot every year. So how much WAR can you expect from a farm system in a single season?

Once again, I calculated the mean WAR value, but this time I found it for each individual season. The mean value was 7.64 WAR with a maximum of 31.7 WAR by the 2003 Cleveland Indians farm system in 2005. The minimum was the 2008 Seattle Mariners farm system which produced -7.4 WAR in 2010. The density plot is shown below.




Once again, we see a positively skewed distribution, but there is something else that I would like to account for before moving forward. To create this visual I compiled seasons one through six for each farm system and treated all seasons equally, but that may not be the best way to interpret the data. The next visual shows the distributions for each individual season in relation to the year of the organization’s ranking. So, in our 2014 Minnesota Twins example the WAR produced in 2014 would be in the distribution labeled “Same Season”. The 2015 season would be in the distribution labeled “Second Season” and so on and so forth to the “Sixth Season” distribution.




The visual shows us that we should not treat all seasons as equal. The first season distribution is vastly different from the others with almost 20% of players hovering around 0.0 WAR produced in a season. The second and third season distributions are not as stark as the same season distribution, but these distributions are not like the distributions for seasons four through six. This makes sense, since many of the prospects appearing in the Prospect Handbook are not perceived to be ready for the Majors any time soon. I believe that it would be more beneficial for my individual season analysis to only look at seasons four through six to determine how much WAR a team can expect from its farm system.

WAR Produced by Farm System in a Single Season (Years 4-6)




Without Seasons 1-3 the mean moves up to 10.3 WAR and the distribution appears a little less skewed. This is a decent way to look at the data, but it is only looking at one variable. How can we account for the different quality of farm systems? This is where Baseball America’s team rankings come into play.

Total WAR Produced by Farm System Ranking

Every season Baseball America ranks MLB’s farm systems from one to thirty with one being the most talented farm system and thirty being the least talented farm system. It is obviously beneficial to be closer to the top ranking, but how much more valuable is a top-tier farm system compared to the bottom or an average farm system? The graphic below is an attempt to answer this question.



This graph was created by looking at the total WAR produced by a farm system in six years according to their team ranking in Baseball America. That means that there are fourteen data points for each box plot. I decided to use box plots for each ranking instead of looking at the mean or median, because I wanted to highlight how much variance there is for each ranking.

I also added a blue trend line to show the general relationship between team ranking and WAR. The blue line shows that WAR goes down as team ranking gets closer to thirty. It also looks like the trend line for rankings one through ten is steeper than the rest of the rankings. This means that moving from the tenth ranked system to the first ranked system has a much larger impact on WAR than moving from the twentieth ranked system to the tenth ranked system.

Single Season WAR Produced by Farm System Ranking

Once again, I thought it would be interesting to look at single season WAR produced by a farm system. I excluded seasons one through three for the same reasons mentioned earlier. Below is the same box plot chart, but for individual season WAR instead of WAR produced over six seasons.




The blue line shows that WAR goes down as team ranking gets closer to thirty, but it also appears that the trend line is more linear than the trend line in the previous chart. I am not sure why this is happening, but I do find it interesting that a single season of WAR is linear, but six seasons of WAR is not.

Conclusions

  • The most WAR produced by a farm system over six years was the 2003 Indians with 136.0 WAR. The second most was the 2006 Marlins with 112.4 WAR. The Indians were such an outlier that teams and fans should probably not expect more than 110.0 WAR from their farm system no matter how good their farm system appears.
  • The average WAR produced by a farm system over six years is 45.83 WAR.
  • The most WAR produced in a single season was in 2005 with 31.7 WAR from the 2003 Cleveland Indians farm system. This means that teams and fans should not expect much more than 30.0 WAR from their farm system in any given year.
  • Seasons one through three for a farm system have vastly different distributions than seasons four through six. This means that teams and fans should not expect their farm system to be productive right away and that they should not be judged too harshly in the first three seasons.

Data Acknowledgments

All the player information was obtained from Baseball America Prospect Handbooks and all the WAR figures were obtained from FanGraphs.com.

Click here for GitHub code.

Analyzing the Draft

Ever since the Major League Draft was created in 1965, teams have been searching for any competitive edge possible to separate themselves from the rest of the league. It is after all one of the best ways to acquire young affordable talent for your organization. Not picking the best player available is a huge missed opportunity for any club and it can set the organization back for years. It can also exasperate even the most devoted fans. Therefore, it is imperative to have successful drafts every year, but what constitutes a successful draft? How many Major Leaguers are available in a draft and where can you find these players? These are some of the questions I hope to answer in this article.

Methodology

Much of my analysis in this article will include references to team-controlled WAR. I calculated each draftee’s WAR total by summing their pitching and hitting WAR totals for the first seven years of their career to estimate the amount of value the players provided their clubs before the players were eligible for free agency. This method is not perfect, because it does not consider demotions to the minor leagues and it incorrectly assumes that every team would keep their prospects down in the minors to gain an extra year of control. However, I believe that the first seven years of WAR in a player’s career is a valid estimation of the value a player provides his organization before he exhausts his team-controlled seasons.

The drafts being examined are the drafts that took place from 1965 to 2004. I chose to stop at 2004, because that was the last year that had every player in its draft class exhaust his team-controlled seasons. If I were to include more recent drafts that still have active players, I could draw erroneous conclusions, since these players still have time to make their Major League debuts and accumulate more WAR in their team-controlled seasons.

MLB Players Drafted by Year

The graph below shows how many eventual Major Leaguers were drafted every year. This dataset includes all players that were drafted whether they signed with their team or not. So, it does include duplicate values for players that were selected more than once. I decided to keep these duplicates, because I wanted this graph to show the total amount of Major Leaguers available in each draft class and if a player didn’t sign with his team the player would still be eligible for the next year’s draft.


The main thing that jumps out is the increase in Major League players starting in 1987. Major League Baseball used to have a January draft and two secondary drafts that took place in June and January and 1987 was the first year where the league consolidated all the drafts into one. This is likely the main reason for the dramatic increase in Major Leaguers selected. I also believe that the number of Major Leaguers increased due to expansion. Major League Baseball increased from 26 teams in 1987 to 30 teams in 1998 and this means that the league created more opportunities for drafted players to occupy those newly created roster spots.

Available WAR by Draft Year

The next thing I chose to investigate was the amount of WAR available every year. It is nice to know how many Major Leaguers are drafted each year, but it is more important to find how much value is available each draft. What would you rather have your team do, draft one All-star talent or three up and down relievers? The previous chart ignored the value created by these Major Leaguers, while the chart below shows how much total WAR was available in each draft class. I once again included players that were drafted more than once for the same reason listed above.



The draft year with the most available WAR was 2002 with 737.1 followed by 695.6 WAR in 1982 and 665.1 WAR in 1981. The draft year with the lowest WAR total was 1975 with 317 WAR followed by 1970 with 332.1 WAR and 1971 with 339.1 WAR.

The 2002 draft had plenty of high profile first round picks like Cole Hamels, Zack Greinke and Prince Fielder, but even more impressive was the overall depth of Major League All-stars available that year. Brian McCann, Joey Votto and Jon Lester were all selected in the 2nd round and they each contributed over 20 WAR in their team-controlled seasons. Russell Martin was drafted in the 17th round and Jacoby Ellsbury and Matt Garza were both selected out of High School but chose not to sign with their teams.

The 1982 draft featured Barry Bonds, Dwight Gooden, Bret Saberhagen, Will Clark, Barry Larkin, and Jose Canseco and each of them produced over 25 WAR in their team-controlled seasons with Barry Bonds leading the way with 48.4 WAR.

The 1981 draft had eight players produce over 25 WAR and were headlined by Roger Clemens with 43.7 WAR out of San Jacinto College. He was selected by the New York Mets in the 12th round, but he did not sign and went to the Red Sox in the first round two years later. Can you imagine the 1986 Mets rotation if they had signed both Clemens and Gooden?

The 1975 draft was not devoid of talent with Andre Dawson, Lou Whitaker and Lee Smith all being drafted, but Dawson was the only player that produced over 25 WAR with Lou Whitaker and Jason Thompson coming in second and third with 22.4 and 22.3 WAR. There were a few other Major League contributors in this draft class, but the depth of this year’s draft class just was not there.

Available WAR Bucketed for each Draft Year

The information above only shows the total amount of WAR, but the distribution of WAR matters as well. For example, if there are two draft classes with a total of 300 WAR each and one is produced by 10 players and another is produced by 30 players. A team that selects later in the draft would probably prefer to pick in the class that has 30 players, since their odds of selecting some type of Major League contributor are higher than the year with only 10 players.

The table below will be grouping drafted players by 5 WAR increments to show the number of players available in each grouping by draft year. I also replicated the table showing the percentage breakdown for each year as well to account for the different sizes in draft classes.




There is a lot to unpack here. The first thing we will examine is the overall success rate of MLB draft picks. There have been 45,694 total draft picks through 2004 and 6,459 of those eventually made it to the Majors. That means that 85.86% of draft picks never appeared in a Major League game. Of all the draft picks made, 44,325 of them did not produce more than 5 WAR in their team-controlled seasons. That means 97% of all draft picks produced less than 5 WAR in their team-controlled seasons.

The next thing that stands out is the minimal number of star-level players that are available in each draft. On average there are only 3.15 players available each year that produce 25 or more WAR in the Majors. The year with the most of these players was in 1981 with eight and 1974 was the only year that did not have any players produce 25 or more WAR.

The main take away from these tables is that it is extremely difficult to make it to the Major Leagues and even more difficult to produce at a high level. It is no surprise that it is extremely difficult for teams to build a championship caliber roster through the draft alone. Which brings me to the next part of my analysis.

Best Draft Classes of All-time

The draft is paramount to any organization’s long-term success. Being able to consistently choose future Major Leaguers to replenish an organization’s depth ensures that a team can compete for championships year in and year out. A successful draft can also launch a losing team into contention, but what constitutes a successful draft? The chart below answers this question by showing the teams that drafted the most WAR in a single season. The dataset used for the chart excludes any players who did not sign with their drafting team. If I included players who refused to sign with their team, I would be rewarding teams that drafted players when they had no intention of meeting the player’s asking price to forgo his amateur career.



The team with the most WAR was the 1976 Boston Red Sox with 79.1 WAR followed closely by the 1999 Cardinals with 76 WAR and the 1989 Twins with 75.4 WAR. These teams obliterated the Major League average draft class of 14.26 WAR and the median value of 10.9 WAR.

The 1976 Red Sox draft class had a total of seven Major Leaguers with three players producing over 10 WAR each. Bruce Hurst produced 12.5 WAR and Mike Smithson produced 12.3 WAR, but the heavy lifting was done by seventh round draft pick Wade Boggs. Boggs produced a whopping 51.7 WAR in his first seven seasons which is the second most ever by a player drafted from 1965 to 2004. The only player to produce more team-controlled WAR over this period is mentioned in the next paragraph.

The 1999 Cardinals draft class had a total of eight Major Leaguers with Albert Pujols and Coco Crisp leading the charge. Coco Crisp produced a solid 16.9 WAR, but Albert Pujols accumulated an astounding 53.5 WAR in his first seven seasons. The Cardinals had three first round picks in this draft and almost all the value came from a seventh and thirteenth round selection. This goes to show how complicated it is to scout and develop Major League talent.

The 1989 Twins drafted several important contributors to their 1991 championship team in Chuck Knoblauch with 33.6 WAR and Scott Erickson with 18.4 WAR. They also drafted future Rookie of the Year winner, Marty Cordova, in the tenth round and he amassed a total of 5.7 WAR. The Twins drafted four other eventual Major Leaguers led by University of Minnesota Left-Handed Pitcher, Denny Neagle, with 14.8 WAR.

MLB Success Rate by School Type and Position Group

With the margin for a successful draft being so thin, it would be beneficial for an organization to know which type of player has the best chance of cracking the Major League roster. The chart below is broken down by the school type the player was drafted from and their primary position at the time of the draft. The school types are listed as follows: 4Yr represents college, HS represents high school and JC represents a junior college.



College pitchers have the highest success rate with 20% of left-handed pitchers and over 17% of right-handed pitchers making the Major Leagues. This is likely due to the high churn rate of Major League pitchers as well as the perceived Major League readiness of college pitchers. Shortstops also have a relatively high success rate across all school types with every shortshop at or above a 15% chance of making a Major League roster. I believe there are two main reasons for this. The first is that most amateur teams play their best player at shortstop due to the difficulty of the position and the propensity of plays the position is involved in. The second is that shortstops are athletic enough to play many different positions and this versatility allows these players to have more opportunities to eventually crack a Major League roster.

Where to Find Major League Talent?

Organizations are always on the lookout for Major League talent, but where do most of them come from? If a team could identify the country’s hot spots of talent, they could allocate their personnel and travel resources more effectively and efficiently to gain an advantage over other teams. The heatmap below shows where all Major League players that were drafted came from in the contiguous states using their school’s coordinates.



The state of California clearly produces the most Major Leaguers. This is no surprise due to the prestigious baseball colleges in the area as well as the high schools that allow their players to play baseball year-round. The usual suspects are here as well such as Florida and East Texas, but there are a few things that surprise me. The first is the number of Major Leaguers that come out of Southern Kansas and Oklahoma. The University of Oklahoma, Oklahoma State University and Wichita State University have done a good job of producing Major Leaguers over the years. I am also surprised by the dearth of Major Leaguers found in the Mountain West. This is most likely due to the overall sparseness of the population, but it is still surprising to see such a large gap of overall baseball talent in the nation.

Conclusions

There are several conclusions that can be drawn from this study.

  • Finding Major League talent is not easy. Over 85% of drafted players never make it to the Majors and only 3% of drafted players contribute more than 5 WAR in their team-controlled seasons.
  • Appreciate home-grown stars because they are few and far between. On average there are only three players selected every year that produce over 25 WAR in their team-controlled seasons.
  • College pitchers and shortstops have the best chance to reach the Majors at some point in their career.
  • California is a hotbed of Major League talent.

Data Acknowledgments

All the draft data was obtained from Baseball-Reference.com and all the WAR figures were obtained from FanGraphs.com.

Click here to view code on GitHub.

An Analysis of Minor League Development Paths

For years, Major League Baseball organizations and their fans have focused on their prospects. They wonder who will make it to the Majors, how productive they will be and when they can be expected to contribute at the Major League level. I attempted to answer these questions by looking at drafted players from prior years and examining their level reached by years of experience. Based on historical data, what can we expect from Minor League players going forward? I believe the data below provides useful information that shows league wide player development trends.

Methodology

My population consisted of 8,748 players that were drafted and signed with their team from 2000 to 2009. I chose this period because it was one of the most recent time periods that allowed me to view almost every player’s full amount of team controllable years through the 2019 season. I then split the population into position players and pitchers according to the position listed on Baseball-Reference’s draft data.

Next, I took the level that the position player had the most plate appearances in for each season and assigned this level to the player’s corresponding year in his professional career. I did the same thing for pitchers, but I chose batters faced instead of plate appearances. I repeated this process for the first seven seasons of each player’s professional playing career.

I chose the level with the most plate appearances and batters faced instead of highest level reached because there are times throughout the season where a player may spend some time at an affiliate to fill the roster for a week or two due to an injury in the organization. Once the injured player returns, the promoted player will be sent back down to his original level. For this reason, I believe that spending most of the season at a level is a more accurate depiction of a player’s development level than choosing the highest level reached in a season. I decided to analyze the first seven seasons because that is how long a drafted player must wait until they can reach minor league free agency and the team no longer owns the player’s rights. Below is an example for 2002 draftee Denard Span to help understand the process of my methodology.

Denard Span was drafted by the Minnesota Twins in 2002 so his first professional season takes place in 2002 and his seventh season takes place in 2008. Span did not play any games in 2002 so his level is listed as DNP. DNP in this study signifies any season that a player did not play for an affiliated team in the United States. He then spent the entire 2003 season in Rookie ball. In 2004, he had 19 plate appearances in Rookie ball and 282 in A-ball, so we put an A in the Year 3 column. Span had 212 plate appearances at High-A and 304 plate appearances at the AA level in 2005 and he spent the entire 2006 season in AA as well, so we enter AA in the Year 4 and Year 5 columns. He then spent the entire 2007 season in AAA with 548 plate appearances. In 2008, Span had 184 AAA plate appearances and 411 Major League plate appearances. Therefore, we enter MLB in Year 7.



After finding the development path for each drafted player, I then found the year that each player debuted in the major leagues and calculated their WAR total for their debut season plus another six full years after their debut to simulate the amount of team control WAR. This assumption is not perfect because it does not consider a player’s actual service time and it may underrate players that were optioned back down to the minors for long periods of time. However, I believe this method is a decent facsimile for the value a player brings to an organization in his team-controlled years. Let us return to our Denard Span example. Denard Span debuted in 2008, so we sum his WAR totals from 2008 to 2014 and we find that he produced a total of 22.5 WAR in those seven seasons.

Overall WAR and Chance of Reaching the Majors

I have two goals I hope to accomplish with the tables in this article. The first is to show the average amount of WAR a player generates during his team-controlled years given his level assignment and experience. This information is color coded by quantity and is shown on the left side of each cell. Green represents a high amount of WAR and red represents a low amount of WAR. The second is to show the percentage of players in each cohort that have reached the Major Leagues at some point in their career. This figure is shown on the right side of the cell with a data bar. If a level and year combination did not have at least fifty players, I omitted them from the chart to avoid having small sample sizes misrepresent leaguewide tendencies. To find the average amount of WAR I decided to take the sum of WAR divided by the number of players that have reached the Major Leagues.



As expected, the higher the level and the less experienced a player is, the more WAR and the better chance a player has of reaching the Majors. This makes sense because if a player is performing well at their assigned level, they will be promoted to the Major Leagues quicker than the rest of the sample. The exceptions seem to be in years three and four where the average WAR is higher for players that sat out the season or played in Rookie ball than players who played in Short-season ball. This is most likely due to the small sample of players that were on a rehab assignment that eventually made their way to the Majors. Since I divided by Major Leaguers instead of total players, this allows one or two players to skew the WAR figure when very few players make the Majors in a sample.

This table gives us a good deal of information. However, we can learn more by splitting the population into different groups of draftees and analyzing the different development paths taken by drafted players.

WAR and Chance of Reaching the Majors for College Players vs. High School Players

The first difference I decided to look at was between college and high school draftees. College players are three to four years older than their high school counterparts, so it stands to reason that their development paths should be quite different.



The first thing that jumps out is that high school players take longer to reach the Majors than college players. High schoolers did not have at least 50 players reach MLB until Year 5. Meanwhile, 50 or more college players reached MLB starting in Year 3.

High school players also seem to have a more uniform start to their career than college players. The only playing level in our table for high school players is Rookie ball and the highest level in our year 2 sample is Low-A. Meanwhile, a college player’s career can range anywhere from Rookie ball to A+ in their first professional season and Rookie ball to AA in their second professional season.

I also find it interesting that high school players have a higher average WAR than college players. This could be because many of the best amateur players forgo college altogether to get a head start on their professional careers and this would create a selection bias in our data that skews toward high school players.

WAR and Chance of Reaching the Majors for Position Players vs. Pitchers

The next thing I wanted to investigate was the difference between position players and pitchers. Both groups have quite different jobs and it is possible that they could have radically different development paths.



The main difference between these two tables is the average WAR for position players at the Major League level is much higher than pitchers at the Major League level. This is probably because many drafted pitchers eventually end up in the bullpen where they are unable to accumulate as much WAR as a position player due to their limited playing time.

WAR and Chance of Reaching the Majors by Position and School Type

We have compared the differences between school type and position separately, now it is time to analyze each group of draftees by school type and position together. The four tables listed below are in this order: College Position Players, High School Position Players, College Pitchers and High School Pitchers.We should be able to use these four tables to determine which minor league players tend to have the best chance of making the Majors and how productive they will be if they do make it to the Majors. Teams may also be able to use this data to guide them in assigning players to a level for the upcoming season or even when to release a player who has not lived up to expectations but may have a higher perceived ceiling than his teammates.





Many of our observations from earlier are still true. High school players produce more WAR than their college counterparts, but they take several years longer to reach the Majors and the players that get promoted earlier in their careers produce more WAR.

The last thing I want to make clear is that these figures are all aggregates and they cannot be used to predict an individual players success. Just because a college pitcher makes it to AA in his second season, does not mean that he has a 62.61% of making the Majors or that he is expected to produce 3.7 WAR. It just means that similar players have had that amount of success in aggregate. However, I do believe that it shows realistic expectations for career development of drafted players and that it can be used to help teams make informed decisions about where to place players in their organization and create timelines for realistic windows of contention.

Conclusions

There are several conclusions that can be drawn from this study.

  • On average, High School amateurs that make the Majors produce more WAR than college players that make the Majors.
  • Position players produce more WAR than pitchers.
  • College players reach the Majors faster than high school players.
  • If a player reaches AAA, it is highly likely they will play in the Majors at some point. Almost every AAA cohort had at least 70% of their population make a Major League appearance.
  • The less time a player spends in the minors, the more WAR they produce.

Data Acknowledgments

All the minor league playing time data was obtained from Baseball-Reference.com and all the WAR figures were obtained from FanGraphs.com.