For reference purposes, only about 30% of Minor League hitters and 35% of Minor League pitchers at these levels produced any Major League playing time at all, and only about 9% of hitters and 8% of pitchers experienced anything that could be considered above replacement level status.
IP/G – 0.21
K/BFP - 0.19
K/IP - 0.18
WHIP – (0.14)
CERA – (0.14)
ERA – (0.13)
K:BB – 0.13
HR/IP – (0.08)
BB/IP – (0.05)
Hitter Statistical Correlation Coefficients
RC/27 – 0.29
OPS – 0.28
2*OBA + SLG – 0.27
wOBA – 0.27
SLG – 0.26
AVG – 0.25
OBP – 0.23
H/PA – 0.23
H/BIP – 0.17
wPOW – 0.21
LWP – 0.21
ISO – 0.21
XBH/AB – 0.21
HR/AB – 0.19
K/PA – (0.15)
2B/AB – 0.14
First Base Rate(FBR) – 0.12
Speed (Diamond Futures)– 0.10
Spd(James) – 0.10
1B/AB – 0.09
BB/PA – 0.08
3B/AB – 0.05
A couple of interesting observations:
The results tend to support the plethora of research that’s been compiled over the last few years regarding the randomness of Balls in Play (BIP).
CERA is the standard calculation for Component ERA.
wOBA is the Tom Tango weighted OBA formula
wPOW is a weighted Power calculation that we use at Diamond Futures. The formula is: wPOW=(doubles*3.0 + triples*1.2 + homeruns*10.0)/AB. It is scaled to approximate SLG, but focuses on only the XB components.
First Base Rate, defined as FBR=((1B/AB + BB/PA)*1.2), is an on-base calculation that we use at Diamond Futures that is scaled to approximate OBP, but is only focused on singles and walks.
Speed is a calculation we use at Diamond Futures that weights three of the components (SpdSB%, SpdSBA, SpdR) from the Bill James Speed Calculation. It is scaled to produce a speed rating that ranges from 0-10.
Speed (James) is the standard Bill James 4-component calculation (not using fielding range).
Conclusions –
Age vs. Level of competition is still the single strongest indicator of future Major League success.
Triples have surprising little correlation and little value even used in combination as a predictive factor.
While everyone continually searches for a single statistic measure, the most useful predictive statistics remain weighted combinations of simple statistics. For example, while Average has a correlation coefficient of 0.25 and it is comprised of (1B+2B+3B+HR)/AB, we can combine the variables of 1B/AB, 2B/AB, 3B/AB and HR/AB in a weighted method that yields a correlation of 0.28. Our best correlations occur when we weight and combine multiple, unrelated, measures. By this I mean it is best to avoid using both OBP and SLG because we would be duplicating too many statistics (singles, doubles, home runs, etc.) to produce precise results. Instead we want to use measures that isolate individual characteristics or variables.
While people like us continue to try to reinvent the wheel, traditional scouting has told us about the five tools of a hitter, Hit for Average, Hit for Power, Speed, Defense and Arm Strength. The numbers tend to back up the traditional intuition, with slight modification. The results indicate that there are four significant offensive characteristics that predict Major League performance: 1) The ability to make contact/reach base {we best define it by the formula for First Base Rate}; 2) The ability to hit for power {defined by the formula wPOW }; 3) The ability to judge/control the strike zone {defined by K/AB}; and 4)Speed {defined by using weighted components of the Bill James Speed formula}…When used in combination with 5) Age vs. Level of competition, we can run a regression analysis that will yield a weighted formula that produces a correlation coefficient of 0.45. It is likely that at some future date we will look at incorporating some sort of fielding/zone rating and we will then likely get correlations greater than 0.50.
The actual results produced are somewhat striking. Of the top 50 hitter names produced from the regression analysis formula, 38 went on to have significant Major League careers. Of the top 25 names, only Rueben Mateo, Cal Pickering and Dernell Stenson could be classified as 'misses'. After deriving the formulas from the 1997 and 1998 data, we then tested the results on data from 1996 and 1999 seasons and produced equally strong correlation results.
Pitching isn’t as easily evaluated, or at least doesn’t yield as strong of results, but that doesn't mean there aren't predictive measures. Again, staying away from traditional cumulative statistical measures like ERA and WHIP, we can break pitching down into five significant characteristics: 1) Age; 2) Stamina {measured by IP/G}…Major League pitchers strongly tend to come from the pool of Minor League starting pitchers; 3) Dominance {which we define using K/BFP}; 4) Ability to keep the ball down and not give up the HR {defined by HR/IP}; and 5) Ability to avoid the free pass {defined by BB/IP}. I would have liked to see the correlations of Ground Ball/Fly Ball ratios, as some recent work we have done shows some promise in this area, but we just don’t have the historical data available to test them. Once we run a regression analysis on these variables, we can weight them in such a manner that produces a correlation coefficient of 0.35.
All of these results are built on characteristics defined by one small data segment (minimum of 120 ABs or 50 IP). We have just started to look at how having multiple years/multiple data segments can be combined to produce even better results.
If you would like a slightly more detailed version of this study as a Word document, feel free to email me at baseballnumbers@ix.netcom.com.
No comments:
Post a Comment