Mathematical Relationships Between
Winning Percentage Estimators (WPE’s) in Baseball In Sabermetrics, the mathematical study of baseball, there are many ways to estimate what a team’s winning percentage (w) should be against its opponents, given the total runs (y) it has scored and the total runs (x) it has allowed in a certain number of games (G). Bill James’s Pythagorean Formula w =
y^2/(y^2 + x^2), which I call "Pyth2",
was the most famous early example--simple and pretty accurate; but many other
wpe's have been found, though they only improve average accuracy by a small amount. My purpose here
is to analyze mathematical relationships between them, and to show that they quite
often have a common model and structure,
which is related to that of Bill James's
Log5 Formula, though Log5 uses different input (not runs) to estimate the same
w.
My efforts were first motivated by trying to understand, over
many years, why Pyth2 is actually true, and from what basic principles
it can be derived. I finally figured that out, and it led to the
more general model. Notation
It is also
helpful to write v as s/r
, where s = (y - x) is the surplus runs (of the team as compared
to its opponents, with s being negative if it is outscored by its opponents),
and r = (y + x) total runs scored by both teams together. This is not true of r or s--neither uniquely determines the other, nor any of u, q, t, or v. And vice versa--knowing u, q, t, or v does
not determine r or s. Once one
has a formula for a potential Winning Percentage Estimator (wpe), one wonders
how accurately it applies to real teams in Major League Baseball. I measure wpe’s against a roughly 1260
team-season sample of MLB from 1903 to 2010, with an overall w of .50000 in the
sample.
My sample uses only team-seasons with 149 games or
more, which improves the accuracy of all wpe's over what they
would be if used on shorter seasons. I use Excel to measure the root
mean square error (rmse) in the predicted winning percentages as compared to actual
results—but I don’t then convert to “Wins”, as many analysts do. Since
my sample isn’t all MLB team-seasons, and leaves out all shorter
ones, I
can’t tell which wpe's are “really” most
accurate—but the general relationships I find agree pretty much
with accuracy info about wpe's I've found online from other
sources who do use all data. I
just want (and use here) a rough estimate of which measures (including my newfound
ones) are substantially different from others.
As a benchmark, BJ’s Pyth2 has for my sample a rmse of .0264 (the difference between a w
of .500 and a w of .5264, or around 4 wins in a 162 or 154-game season), and the lowest rmse's for other formulas are a couple at .0256, and one at .0255. The “worst” (still
halfway-decent, and often quite accurate for most "normal" teams) ) wpe's I consider can get up to rmse of about .031.
The .026 or .025 rmse level is almost certainly the best possible, since
there is an ineradicable element of random chance in baseball. More Notation: y
= runs scored by your team x = runs scored against your team, by its opponents, i.e., runs
allowed by your pitchers Contents: Note that the penultimate section is to be a
concise list of all the various wpe’s I consider. 1. Quality, Wins, Runs, and
Chance:
Some baseball teams are
better than others. This difference in quality arises from the batting and pitching and
defensive skills of their players. These
skills lead to runs scored and allowed, and those lead to Wins and Losses. We will define a “measure
of quality” Qy for a team Y as an appropriately chosen, analytically plausible, statistical result of its play, involving bases, outs, runs, Wins, or other measures. [But we will assume it involves runs, unless otherwise stated.] Is it true that when two
baseball teams play each other over a long series of games, they will tend to
win in proportion to their respective quality measures Qy and Qx? That is, if one team has k times as much quality as another, will it tend to win k times as many games? The answer is (roughly) yes! This holds up for many different plausible
measures of quality, including u = y/x. However,
note that this is not true for all
sports! It is only true in baseball
because there is just enough random chance involved in the game to make
it true. This random chance comes from many factors,
but one major one is that a baseball team's "quality" is a (weighted)
average of that of all its players--but not all players play in any given
game. This is especially true of
pitchers--the team's 5th best pitcher may be pretty bad, even though
the overall average quality of the team may be better than any opponent's. When the 5th best pitcher pitches, the team
has a good chance of losing. Or, when
their best pitcher pitches, but happens to pitch by chance against the only
good (but very good!) pitcher the
opposing team has, the team may still lose. But in
arm-wrestling, there is little role of chance:
if I am twice as strong as my opponent, i.e., if my “quality” of
arm-strength enables me to lift 200 pounds while she can only lift 100, I will NOT
win twice as many of my matches with her as she will—instead, I will always
win, practically speaking, for an infinite ratio of my wins to her wins. This fact could be altered if a greater
element of chance were artificially introduced into arm-wrestling--say, if a
muscle relaxant were randomly administered to one contestant before each
match. Then whoever got the relaxant
might often lose, no matter how weak their opponent. But barring that sort of thing, there is not
enough chance in arm-wrestling to result in contestants winning in proportion
to their quality. And in
basketball, the Bulls team that went 72-10 did not win in proportion to its
relative excess of quality over its opponents.
It was nowhere near “7 times as good” as its average opponents—not by
any measure (shooting percentage, speed, strength, points, rebounds, steals,
free-throws, etc.), nor even by the total sum of its (small) excesses in each
category. But, as in arm-wrestling,
there is less “chance” in basketball than in baseball, so even a modest surplus
of basketball talent over one’s opponents much more often allows the team to
demonstrate that surplus by winning the game.
This is because (basically) the "good" players on the team
always play in every game, and especially near the end, when close games are
decided--Michael Jordan was rarely "by chance" not around to help
determine a game's outcome when needed. However,
the fact that this proportionality of wins to quality IS roughly true in
baseball, under many different plausible measures of quality, leads to a basic
model for many different sabermetric formulas, which may at first seem ad hoc
and unrelated. Note that “quality” is inherently a relative notion. If one team has quality 3 while another has quality 6, it should make no difference if instead we measure the first team’s quality as 10 and the second one’s as 20. But there is an obvious way to “normalize” the measures: make sure that the “quality” of an average team (which is predicted to win half its games, with w = .500, and with x = y) is equal to a constant: and 1 is certainly the best constant. I will do this normalization for some quality measures, particularly ones based on the run ratio y/x: if y = x, then Qy = 1 = Qx. However,
since this "normalization" is not at all necessary to make the model
work, and is merely an "aesthetic" feature, I will not always do it.
One can always force any quality measure into a different,
normalized form that yields the same wpe via the Axiom, but when
the logical features and rationales of relative quality
measures are already apparent, even though it isn't "normalized" to 1,
I will often not bother to do so, since doing so can introduce
extra mathematical cumbersomeness, and never changes the final Quality
Model results of predicted "w". *************************************************************************** BASIC AXIOM of the Quality Model: A baseball
Team Y's winning percentage w
against Team X is well-predicted by the following model: For a given
(plausible) measure of quality, with team Y having quality called "Qy",
and its opponent team X having quality called "Qx", w
= Quality of Y / [Quality of Y +
Quality of X ] = Qy / (Qy + Qx) The Basic Axiom simply says that
teams are predicted to (and will, roughly, actually) win in proportion to their measured "Qualities") *************************************************************************** In
baseball, there are many quality measures for which, when team Y has “k” times
the quality of its opponent according to that quantitative measure, it will
indeed generally win “k” times as many games in their matches. That is, baseball
(roughly) obeys the Basic Axiom, and a plethora of roughly equally accurate wpe’s demonstrate this in their common
mathematical structure. For various
mathematical reasons, it may sometimes appear that implausible Quality
measures give an accurate wpe--in which case the goal is to find a different,
plausible quality indicator that yields the same formula.
For Example: Bill James's (BJ's) Pythagorean Formula with
N = 2 says that the winning
percentage w
for a team Y in a league is: Pyth2: w
= y^2 / (y^2 + x^2) It is
certainly a pretty good predictor, as wpe's go.
Here y is runs scored by Y, and x is runs allowed by Y, i.e., runs scored by its opponents (X). This result would trivially follow from the
Basic Axiom IF we chose runs^2 to be the the quality measure of each team,
respectively. That results in y^2 being
the quality Qy of Team Y, and x^2 being Qx, the quality of the agglomerated
league opponent Team X when playing against Y. But that choice is NOT
intuitively plausible--I wondered for 28 years WHY the runs-squared were used! Why
aren't just runs themselves (without the squares) indicators of team quality, and hence predictors of winning percentage? One answer
would be that if Qy = y and Qx = x, the resulting formula via my Basic Axiom
would be w = y / (y
+ x), which has been shown to NOT predict real-life team winning percentages
very well. Pyth2 predicts w pretty accurately--refinements
and alternate formulas never do very much better than Pyth2. But, in fact, there is a much more plausible
indicator of team Quality than y^2 and x^2, one which leads (via my Axiom) to
the Pyth2 result, with the
squares. The squares are not a
natural way in which to measure the "quality" of a team—they simply
result from the way in which the Axiom applies to the “real” (more plausible)
quality measure. We will see this below. 1B. Bill
James Log5 Formula First
developed in the late 70's, vaguely mentioned in his 1978 Baseball Abstract,
and explicitly developed in his 1981 Baseball Abstract, BJ's Log5 formula (apparently misnamed, as it seems to have nothing
to do with logarithms) is the
quintessential and original Quality model—he derived it with this concept explicitly in mind. He set an average team’s quality equal to ½,
instead of 1, as I do, but otherwise his approach is the same. In this sense it is the forerunner and
exemplar of most runs-based wpe’s—especially of Pythagorean results. (This is true even though it predicts
head-to-head w from league W/L ratios, NOT from runs—but the structure is the
same.) BJ tried in
his 1981 Abstract to “prove” this relationship between log5 and Pythagorean
results, but unsuccessfully—his claims in the 1981 Abstract about how the two
are related have a fundamental error.
But the formulas are nonetheless very related, as I will show soon. [A discussion of BJ’s original development
(with the error) in the 1981 Abstract is presented further down the webpage.] Log5 is an extremely useful formula
describing how often Team Y could be predicted to beat Team X in a series of
head-to-head games, if
the only the info we have about them is their Wins/Losses odds ratios against their respective entire leagues.
That is to
say, if Team Y meets Team X in post-season play, and we know that Team Y went
60-40 in the regular season in its league, while Team X went 55-45 in its
league (perhaps the same league, perhaps not), what is the winning percentage w that
we should "expect" in the postseason series for Team Y as it plays
Team X? This is a
very plausible definition of quality—the better a team, the better
should be its odds of winning against the entire league overall. In fact, when two teams do play each other,
their actual W/L odds ratio in that series, after they finish playing each
other, is pretty much our best definition of their relative quality: if,
over a lengthy series, Y won twice as many games against X as it lost, we would
intuitively say: “Y is twice as good as X”. Again, this is true in baseball, but not
in arm-wrestling or basketball. However, such "intuitive"
reasoning runs the risk of being circular: defining quality by
win ratios is begging the question. But the potential circularity
of
the reasoning is avoided once we see that other intuitively plausible
quality
measures (particularly involving runs, not wins) also lead to similarly
good results using the Quality Model. Substituting in the Basic Axiom, we get: w
= Qy / (Qy + Qx) =
Odds Ratio for Y / (Odds Ratio
for Y +
Odd Ratio for X) Or: Log 5 (Odds Ratio
Version) : w = Wy/Ly
/ (Wy/Ly +
Wx/Lx ) So, when a
60-40 team plays a 55-45 team, it should have predicted winning percentage w
of: w =
(60/40) / [60/40 + 55/45 ] = .551
. This
discovery by BJ has long been shown to be a very useful and accurate result. (It was also immediately transformed
mathematically into an equivalent version using inputs of team Y’s and team X’s winning
percentages against their leagues, rather than their odds ratios. But the odds ratio version above is the
important one for other wpe’s.) BJ’s Pythagorean Formula for power 2 (Pyth2) says that winning percentage
w is given by (with y = runs scored by Y, and x = runs allowed) Define Qy, the Quality of Team Y , as Qy = y/x. That is, the quality of Y is measured by the "run ratio" of
its runs to its runs allowed. This is
certainly an intuitively plausible measure of the quality of a team. So the Quality of Team X is also defined by its runs ratio against Y: its runs scored against Y (x) to the runs it gives up (i.e., y, the runs Y scores against X.) That is, Qx = x/y Plugging the above Qy and Qx
into the Basic Axiom, we get w =
Qy/ (Qy + Qx) = (y/x) / [
(y/x) + (x/y) ], or clearing fractions, multiplying by xy in top and
bottom, we immediately get Pyth2. So
this is why the "squares" were in the Pyth2 formula--I was happy
to see it after 28 years of wondering! It is this sense in which
Log5 and
Pyth2 are structurally the “same” formula, as BJ tried unsuccessfully to show in 1981: they
both stem from plugging in a quality measure to the Basic Axiom, with log5
using W/L, the odds ratio, as the quality measure, while Pyth 2 uses y/x, the run ratio. Since runs scored tend to increase wins,
while runs given up tend to increase losses, the similarity of the results is
wholly natural and plausible—but note that the “squares” aren’t in the measure
of the quality; rather, they come from the mathematical structure of the Axiom,
and aren’t “ad hoc” (as they seemed to be for BJ in 1981, though it's hard to be sure.) I get Rmse
for Pyth2 as .0271, which will be the “pretty good” benchmark against which to
measure other wpe’s.
It was rapidly realized, even by BJ in 1981, that Pyth2 could be made more accurate (a little bit) by using a different power n, instead of n = 2—he claimed that n = 1.83 was best, and that has held up as roughly correct since then, though it differs slightly depending on the data one tests against. I personally get the best fit at around 1.85, but my data is only a sample, so I'm going to stick with 1.82. I’ll use the “best” exponent as 1.82 . PythN: w = y^n / (y^n + x^n) [ Best fit to real data: n = 1.82
] (Rmse = .0258) We derive
this from the Quality model by using: Qy = (y/x)^(n/2),
and Qx = ((x/y)^(n/2) , and
substituting into the Basic Axiom. Conceptual Problem: A Digression into "Effective
Quality". But now we
ask, why is it plausible that Quality would be proportional to (y/x) raised to a weird
decimal power (n/2), rather than proportional to the intuitively plausible
y/x ? Well, it's not really plausible--baseball does NOT
in fact quite follow the quality model.
But it does follow it approximately,
and closely enough that the model still shows why many wpe's are structurally
related. Note that for the “best” results,
n/2 is about 0.91,
which is to say that it makes (y/x) a tiny bit smaller when y>x, and thus
makes (x/y) a bit larger in that situation.
That is, it moves each run ratio
back towards 1 by a tad, and hence moves the predicted w back towards .500 by a tad. BJ in
1981 grasped the import of this fact--historically, MLB baseball has had just a
little more chance in it than the amount required to produce a
"perfect" proportion between quality and winning. As we saw for arm-wrestling, introducing an extra
element of chance into the game alters the degree to which "real" quality
differences will manifest themselves in winning percentages. Thus, the more chance involved, the closer
the resulting winning percentages are to .500, no matter what the real quality
differences between the contestants. One might also phrase this by saying
that a small "extra" element of chance combines with a team's
"real" quality to create its “effective” quality, and that it then
wins in proportion to its effective quality.
But, though
this presents a handy way of discussing things, it is essentially circular
reasoning--then "effective quality" simply becomes "whatever
correlates best with winning", which is NOT how we want to envision a model that uses "real" quality and the Basic Axiom. Note, crucially, though, that
how many total
runs (or points) are scored in an average game (RPG) affects the
degree to which “chance” influences winning..
When lots of runs are scored, there is more opportunity for the real
“quality” (which is what produces runs) to manifest itself, and less
opportunity for chance to make an actual difference in the outcome of the
game. And vice versa. This fact
is well known in sabermetrics, and leads to wpe’s that include RPG = (y + x)/G in them, by which the greater the
RPG, the more the resulting w is due to a magnified
effect of the "real" quality ratio upon the winning percentages. Again, this is true in basketball, which has
a couple hundred points scored per game, and where w = (y/x) raised to a large
power. We will look at wpe’s involving
RPG below—but note that while the MLB RPG has varied significantly in different
eras, it has historically been in the 8 to 10-ish range, which is sufficiently
low that apparently chance plays a significant enough role to make the ideal
exponent in Pyth(n) a bit less than n = 2. Again,
this
effect of chance at historically average MLB RPG levels means that the
Basic Axiom
isn’t “really” true—rather, it’s true for
“effective” quality, as opposed to “real”
quality--but it’s close enough for horseshoes, and for MLB
baseball. Again, the difference in accuracy between
Pyth2 and Pyth1.82 is very small, on average—a small fraction of a win per team per
year. For almost all purposes of measuring MLB w’s,
the Basic Axiom is roughly true And this is shown by the fact that almost all
wpe’s (of varying accuracy) can be shown to fit it quite plausibly. What it
does mean is that many Quality measures result in wpe’s that can be slightly improved in predictive value by
using some method of “moving them back to .500” just a tad—again, as
realized by BJ in 1981. That is, by
various means, one tries to establish the "effective" quality of a team,
including an effect of chance. This led
him to what I call: w =
[ y^2 + K ] / [ y^2 + x^2 + 2K ] Best fit: K = 38000 (Rmse = .0255) checked Adding any constant in the numerator and twice the constant in the
denominator moves the overall result a bit closer to ½. BJ used K = 60,000, but I find with my data
that the best fit is around 38,000.
Doesn’t matter too much… This has
the same effect on overall accuracy as changing the 2 to 1.82 and not using any
K. Qy =
(y/x) + K/(xy) , with
Qx = (x/y) + K/(xy) BUT, from now on, we will NOT measure or model "effective" quality any
more, as, again, the real "quality"measures are still good predictors, and it
is the mathematical relationships between the various fundamental wpe's
in which I'm interested, not ad hoc tweaks that produce minutely better accuracy. 3. Bill James's "Win Shares"
Marginal Runs WPE Define A = Average Total Runs Scored per
team for a league in a given year. 3A) Bill James
Marginal Winning Percentage: w
= [y - (1/2)A +
(3/2)A - x ] / [2A]. (Rmse = .0272 or better) Note: s = (y - x) is often called Run Differential in
the saberrmetric literature, though I prefer to call it the surplus. Derivation of the formula BJMWP from
Quality Measures: Let the
Quality of a team Y be: Qy =
[ (y - x) + A] / A Similarly,
Qx = [ (x - y) + A ] / A Intuitively,
this says that any team Y, no matter how many total runs it scores, that has a surplus
of runs scored over allowed of (y - x) has Quality, relative to that of its
opponents, in roughly the same ratio that
a near-average team with the same amount
of surplus runs s would have to its average opponent. That is, the
"near-average" Team would normally have A runs, but its excess run
differential would now give it (y - x) + A runs. An average opponent would have A runs. This is
fairly plausible--in a league where average total runs per team are 700, a team
that scores 760 and gives up 740 should be of roughly similar overall quality
to one which scores 720 and gives up 700, as should a team that scores 669 and
gives up 649. In BJ's formula, all are
assigned the quality of a team with 720 and 700 runs, for simplicity. [One also
gets the same result by assuming that Qy = A / (A + (x – y), i.e., that the
quality is similar to a situation in which Y scored an Average amount of runs
A, and X scores A + (x – y), i.e., below average, if y > x .] Putting the Quality measures above
into the Basic Axiom,
we cancel the common denominator of A that is in all terms, and get: w = (y -
x + A) / [ (y - x + A) + (x - y + A) ]. I don’t
actually have data on A handy, so I checked it against various constants in
place of A, and for A = 750 got “best fit” rmse of .0272. Thus, it is at least as good as Pyth2, which is
explicitly why BJ used it for Win Shares. But of course it is undoubtedly more accurate using A for each league/year separately. 3B) I decided to modify BJMWP as follows, to get
a new wpe: BJMWP (Symmetric) Assume that
a difference of runs (y – x) is similar to a situation where each team is a
distance of half of that difference above or below an Average Team. That is, assume that it is as if Y scored A +
(1/2)(y – x) runs, and X scored A + (1/2) (x – y) runs. Then apply
the same Quality measure as in the Pyth2 case:, i.e. the run ratio between the
two teams: Qy
= [ A + (1/2) (y – x) ] / [ A + (1/2) (x - y) ] ,
Qx = [ A + (1/2) (x - y) ] / [ A + (1/2) (y – x) ] This, of course, leads to the Pyth2 result with the runs for each team squared: (first, double each runs amount for simplicity): w = (2A
+ y – x)^2 / [ (2A + y – x)^2 + (2A
+ x - y )^2 ] or, squaring and simplifying, BJMWP (Symmetric) w =
(1/2) + (y – x)
2A / [4A^2 + (y – x)^2 ] This is another
wpe using the run differential s = (y –
x), and will be discussed further in that section. Not in spreadsheet. Nor is the next one. No RMSE yet. 3C. Bill James's 1960’s Winning Percentage Model
(3 runs per game per Team) (BJ1960) BJ1960: w = (y/G – H) / ( y/G+ x/G – 2H) for some constant H. BJ used H = 1.5 . G = Games If for H
one used not a constant, but (1/2) ( y
+ x)/2 / G , i.e.,
one used half of the average single team
runs/game in Y’s games, one actually gets Ben Vollmayr-Lee’s Linear Prediction
model "BVL2" (see next section.) A very nice relationship!
That is, if Qy = (y - A/2) / (y + x - A) This
anticipates (by decades!) the very plausible idea of
Bill James's later Win Shares notion: 4. Ben Vollmayr-Lee's Linear Prediction Model In many ways, the "BVL2" formula coming up below is the most elegant single wpe--it is derivable in many beautiful ways from many other, seemingly unrelated wpe's, and is extremely simple and pretty accurate, especially for a "linear" model. Ben Vollmayr-Lee (BVL) wants to measure predicted winning percentage w from Y's fraction "t" of total team runs (for and against), For an
"average" team, where y = x,
and thus t = 1/2, the winning percentage should be .500
= 1/2. Thus, a linear predictor: w(t) = mt +
b should go through (w, t) = (1/2,
1/2). Here, m = slope, as usual. It could
thus also be expressed (as BVL prefers) in point-slope form as w - 1/2
= m(t - 1/2) , which
intuitively relates the change in w from
.500 proportionally to the change
in fraction t from 1/2, i.e., to the change from the fraction for an
average team. The proportionality
constant would be the slope m. However, I
prefer to do the math with the form w = mt + b, as follows:
w = Nt + (1- N)/2 , with N = 2, yields: w = 2t + (1 - 2)/2. or, simplifying: BVL2: w
= 2t - 1/2 . (for
m = N = 2) Or,
expressing in terms of runs y and x, with t
= y / (y + x) , BVL2: w = (3y - x) / (2x + 2y) This is not how BVL puts it, but it's an
equivalent and simple formula. Let YQuality be y +
(y - x)/2 , with XQuality
therefore x + (x - y)/2 . Intuitively, this says a team's
quality is
proportional to the following sum: the runs y they score, plus half the
"s" marginal runs by which they exceed (or fall below) their
opponents' runs. This is certainly a
plausible indicator for the relative quality of a team--it includes (not just)
the runs they score, but also an additional term proportional to how much their
runs exceeded their opponent's runs, where the proportionality constant is
1/2. Plugging in to the Basic Axiom,
we get: 4B) Back to BVLN, as above: Now let N be any arbitrary
exponent. And replace t with y/(y + x): BVLN: w
= N [y/(y + x)] + (1 - N)/2
or, simplifying, w =
Ny / (y + x) + (1 - N) / 2 which, using the LCD, gives us:
w
= (2Ny + y + x - Ny - Nx) / 2(y
+ x), or, simplifying for a couple steps... w =
[y + x + N(y - x) ] / 2(y + x) , or: BVLN: w =
1/2 +
(N/2) (y - x) / (y + x) = 1/2
+ (N/2) v Here is our
first natural occurrence of the parameter v = (y - x)/(y + x). Note that BVL2
in this form
becomes: w = 1/2 + (y -
x) / (y + x) , or w = 1/2 + v . A very simple (linear) form, and pretty accurate! Let Qy
= y + [ (N - 1)/2 ] (y - x) , with Qx
= x + [ (N - 1)/2 ] (x - y) Intuitively,
this again says that a team Y's Quality can be measured by the following
sum: the runs y they score, plus
a proportionality constant times the
"s" marginal runs (y - x) by which they exceed their opponents
runs. Here, the "general" proportionality
constant is (N - 1)/2, instead of 1/2, as in part A), and thus N can be
adjusted here to "best fit" the real-life data. BVL also
shows that whatever N fits the data will also be the exponent in the Pyth. Run
Formula that also best fits the data for that model--via a calculus
tangent-line relationship, which will be discussed below. In fact, the "best
fit" to real life data says that N should equal around 1.8, not 2. w = ( y
+ [(N - 1)/2] (y - x) ) / [ (
y + [ (N - 1)/2 ] (y - x) ) + ( x
+ [ (N - 1)/2 ] (x - y) ) ]
So we look for
wpe's of the form w = f(q). The smaller the value of q, the more games
the team should win. And
secondly that, since by definition the team's
winning percentage equals ( 1 - its opponents' losing percentage), and
since its opponents have the roles of y
and x reversed for them, we must
have f(1/q) = 1
- f(q). I will call
any such function w =
f(q) = 1 / (1 + q^n ) a "Power Run Function", or PRF. Many baseball
analysts are very interested in the "best" exponent n, the most
accurately predictive one.
w = 1 - x/(2y) . This is a mediocre winning percentge estimator and, I found, had been discovered as a
formula by Bill James long ago (not sure how his conceptualization
went), by whom it was named: Kross: DTE = Kross1: y > x: w = 1 - x/(2y) = (2y - x)/(2y) OR: q >
1: w = 1 - (1/2)q DTE------------rmse = .0307 Of
course,
both parts give a value of 1/2 at q = 1, as they should. The nice
thing,
though, is that for large values of q = x/y, which is to say teams that
give up
a lot more runs than the small (but non-zero) number they score, the
winning
percentage does not become exactly zero (or negative!), as it would
with DTE,
but as it would NOT in real life. I don't have my spreadsheet set
up to evaluate Kross by using the different pieces as appropriate for y
< x or y > x, so I can't tell how accurate it is, but it is certainly likely to be a little more accurate than either piece separately. But, I
thought, why not get a single function from the 2 pieces? That's what I did by simply averaging them, in:
This turns out to be a fairly accurate estimator, substantially better than either piece separately.: It has some other versions, such as : KrossAvg:
w = (1/2) + (y - x) M, where M) is a "winning percentage per
run" estimator (see section 9), with M = (y + x)/ 4xy 6C) But in fact, it is better to "average them" incorrectly! If we use this on Kross1 (DTE) and Kross2, and
writing Kross1 as (2y - x)/(2y), with Kross2 as y/(2x), we
get that the "false average" is Kross2: w = 1/2 + (y - x)/2x BJMWP: w = 1/2 + (y - x)/2A BVL2: w = 1/2 + (y - x)/(y + x) 6D) Can We Get Pythagorean Results? Except...what
is that "p"
doing in there...weren't we using "q" as the variable!? Again, this is perfectly
"Pythagorean" in form, just using the "shifted" run
averages. Oh, well...definitely worse than Pyth 2. But, like all good failures, it suggests an improvement--one which is better than Pyth 2! The reason
it does worse than Pyth 2 is that, as we saw earlier, Pyth 2 predicts results
(based on a given run ratio q) without sufficiently taking into account the
effect of chance, which moves w a tad
closer to .500 than would be predicted by Pyth 2. That effect means that a run ratio value a
little closer to 1 predicts more accurately than the actual run ratio
does. But subtracting ½ from X and Y
moves the run ratio farther away from 1, not closer—that is, it further reduces
the role of “chance”, and is hence even less predictive. But why not
move in the opposite direction? Why not increase
Y and X, which moves the q ratio closer to 1, and should thus account more for
chance? Hence, let's use X+ 1/2 and Y +
1/2, and see if we get better predictive results: and we do!
[Of course, this is now an “ad hoc” adjustment, not derived from my
model above—but it’s an interesting one, and is very accurate!] Also, let's no longer treat Y and X as fixed runs per game, but as variables, and as total runs per team--PRF's have no concern for which variable is used, total runs vs. runs/G, as the G’s cancel out. So we get the much more accurate new estimator: with y = total runs in a season for team Y, and x = total runs in a season allowed by team Y, 6F) Pyth+1/2: w = (y/G + .5)^2 / [ (y/G +
.5)^2 + (x/G + .5)^2 ] [Rmse = .0264] What is interesting about SPR and Pyth+1/2 is that they come from or are suggested by a VERY simple run model, quite unrealistically so in the "uniformity" feature, that is totally unrelated to the "quality' derivation of Pyth2, and yet yields the same structural Pythagorean result with exponent n = 2, and, in one case, extremely good accuracy, even better than Pyth1.82. 6G A minor footnote: What happens under the contrary assumption to that in part 6d) above? That is, what happens if we assume that y - x > 1/2 ? The geometry changes a bit, and one gets a slightly different result, with an extra "adjustment" term added to the numerator and denominator of SPR. That extra term is (y - x)^2. Since this involves the surplus s = y - x, but not the ratio of y/x, (or of their shifted values) this is no longer easily expressible as a function of some "p" variable. And it is definitely a bit more complicated--although for most real-life values this extra term is quite small, and thus the final results are quite similar to the original f(p). But at any rate, this seems to have too high a ratio of complication to accuracy to pursue.
_________________________________ If n = 1.82, we get w = .955 - .455 q, or w = .955 - .455x/y. This should be the "best fit" linear wpe based on q. RMSE = ??? xxxx 7A) DTE: A Quality model w = 1 - (1/2)x/y,
or w = (2y - x) / 2y This simplifies to : w = (y - x + A) / (2A}. w = y / (2x) . But this is just Kross2 ! See section 6B). Hence, Kross2 also comes from the simple Quality model of BJMWP. One can also obtain both DTE and Kross 2 from a "General Marginal Runs" formula" I've found by playing around with the above idea, GMR: w = (y - A/2) / (y + x - A) , RMSE: XXX ??? choose a best fit A! see end of sec. 3 = BJA in which if, reversing the substitutions we used above for BJMWP, we replace A by y we get Kross2, and if we instead replace A by x we get DTE. Note that GMRis is saying that the winning percentage is simply the ratio of the excess of runs scored by Y over half the league average, to the surplus of total runs in Y's games (scored and allowed) to half the league average (of total runs for both teams, which would be half of 2A, or A.)
7B) Tangent lines, q, u, v, t, Inflection points, etc. x
and y q
u
t
v
w = 1/2 when: Range: q
=
x/y
1/u
(1 - t) / t
(1 + v) / (1 - v)
q = 1
[0 , oo) Note that all but Pyth2 are linear in at least one version: BVL2 is linear in t and v, DTE is linear in q, and Kross2 is linear in u. And, amazingly, in each linear case, the linear formula is the (calculus) tangent line of the other three wpe's in its column! The same is true for the "q" column: DTE using the q formula is the tangent line to Kross2, BVL2, and Pyth 2 in their respective q-versions. More generality: Each of these above formulas comes in a variation where in stead of using n = 2 in Pyth 2, we could use a "better fit" exponent n. This is generally taken to be around n = 1.82 for best fit to real results. I'm going to use capital N in the titles, but small "n"s in the formulas: This creates BVLN, PythN, DTEN, and Kross2N: For these, which are more complicated, I will give only those paramter's versions which are fairly simple: PythN: y^n/(y^n + x^n) = 1 / (1 + q^n) BVLN: [(n + 1)y - (n - 1)x] /
(2x + 2y)
=
(1 - n)/2 + nt
= 1/2 + (n/2) v 9. Wins Per Run Estimators (WPR's) xxxxTango Tiger , BVL, Palmerr Wins per Run estimators (wpr's) usually rely on the
concept of the surplus runs that your team scores over what its opponents
score, i.e. on the quantity (y - x). Called the "run differential" in
much sabermetric literature, I prefer to label it surplus runs and, if needed for simplicity, to use as its parameter
s = (y - x). By itself, this parameter means little, since
your team's surplus runs affect its ability to win very differently depending
on the actual value of y and/or x, and hence of
(y + x), none of which are determined merely from merely knowing the
result of (y - x).
Thus, we can write (or conceptualize) in general, w
= 1/2 + (y - x) Q/G , noting that (y - x) Q will be "surplus runs" times
"Wins per surplus run", or "total (surplus) Wins" , over
and above the wins that would result if there were no surplus runs, which are assumed to be 1/2 of the games G. Hence, dividing (y - x)Q by G will turn those
"surplus wins" into a "surplus winning percentage", to be
added to the initial 1/2 that obtains when y – x = 0. A
simple and extremely fruitful model, since M = Q/G can take many plausible
forms. Moreover, just in general, 1
extra win (above .500) from roughly 10 extra runs (above y = x) makes sense intuitively, since for half the games in which those
extra 10 runs are scattered about, the team will already (on average) be a
winner, so those extra runs (half, or 5, on average) won’t produce any extra
wins. In
the 5 scattered among
previously losing games, in the ones where only one of the extra runs
was
scattered, it wouldn’t produce a win (at most creating
temporarily a tie, but then unable to be won without another run
scattered into the same game.) So there are actually fairly few games which
are losses, and in which the results of randomly scattering 5 extra runs among them could actually
make a sufficient difference, given the
original margin of loss, to win an extra game. Getting this to happen once seems just about
right on average, though of course teams might get lucky or unlucky in where
the extra runs came. If you ran a
simulation scattering the 10 runs among a teams games by any reasonable
probability distribution, it’s extremely likely that the resulting average
would indeed be around 1extra win—and, again, the nice accuracy of wpr’s using the
rough value proves this. So roughly 1/10
Wins/Run is certainly “in the ballpark”, intuitively. And a pretty accurate one! Here, M = 1/1500. This is slightly more accurate than Pyth2, and not far from Pyth1.82, which sort of makes you wonder why you need them: it don't get no simpler than this formula, which actually is linear in s = (y - x). 1520 or so is actually the best fit for my sample, among all constant denominators, but the differences here a in the ten-thousandths place.... Here are some previouisly familiar wpe’s recast into Wins
per Run models, though most don't use Q or G, just the final result M that is a
proxy for them. DTE: w = 1/2 + (y - x) / (2y) Here, M = 1/(2y) Note that 1/(2y) for an average team in
baseball over the years will indeed be in the vicinity of 1/1500. Also note that his is not how DTE was written
earlier, but simple algebra shows the equivalence. BVL2: w = 1/2 + (y - x)/(y
+ x) This
time we have M = 1/(y + x), where the denominator
is simply the average of 2x and 2y, the denominators of DTE and Kross2. BVL2 is thus naturally more accurate than
either of the first two individually, though maybe not of their piecewise
totality in Kross. And again, 1/(x + y)
will be in the historical average neigborhood of 1/1500. I note that Patriot attributes BVL2 online to
David Smyth--I don't know who had priority, but it's a great little wpe (and
wpr.) However,
it is a bit odd that BVL2, which fine-tunes the individual team
run-contexts, is a bit less accurate than just using M = 1/1500...but maybe my sample isn't big enough...or maybe the game run-context isn't so important! Are all wpr's "Linear"? Well, sort of, but
not really. They equal a constant plus (y
- x)M, which is apparently "linear" in the variable s = (y - x). But this is not true if M is not a constant,
but rather a function of x and y--then the result is certainly not
"linear" in x and/or y, and not in (y - x) either, since that is not a parameter independent of x or y. Pyth2: w = 1/2
+ (y - x) (y + x) / [ 2(y^2 + x^2) ] Again,
a new form, equivalent by algebra to the old one. Here, M = (y + x) / [ 2(y^2 + x^2) ]. This also brings up a minor quibble I have
with Patriot's discussion online, because he says that "linear"
wpr's, of which he lists DTE, Kross2, and BVL2, don't meet constraints on w, namely
that w should be between 0 and 1. But
Pyth2 does meet this constraint, and
is a "linear" wpr in his sense just as much as the others (though
again I would not call any of them linear.)
The point is that M(x,y) can sometimes take a form that does allow the
overall w to always be between 0 and 1--it may not, but it can and sometimes
does. Is M here roughly in the
neighborhood it should be? Yes, set x =
y, and M = 2y/(4y^2) = 1/2y, so on average, it's around 1/1500 or so. BJMWP: w = 1/2 + (y - x) / (2A) Yet
another familiar wpe is a wpr. M = 1/(2A). Obviously, 2A for the league average team
runs A each year is a better choice than either 2y or 2x in the denominator. M
has same rough value using 2A as 2x or 2y, and again is roughly 1/1500.
Ican't tell how accurate this is since I don't have A data, but
using A= 760 as a constant, I get rmse = .0263, pretty accurate.and
fine-tuning A's should make it even better. This is thus probably
at least as accurate as W1500. TT (Tango Tiger):
w = 1/2 + (y - x) (2) / [x + y + 10G] Rmse = .0256 Here, M = (2) / [x + y + 10G] BVLN: w =
1/2 + (y - x) N/2/(y + x) OR: Best
Fit: (N = 1.82): w =
1/2 + 0.91 (y - x) /(y +
x) rmse = .0264 Here, M = 0.91/(y + x) , a "best fit" to the data, and
the same as the best fit for PythN exponent, which also is 1.82. Note:
Recall that for any N, both BVLN and PythN have the same tangent
line: namely, DTEN! However, this is partially true merely by
definition: we define DTEN as "the tangent line of PythN"! This is in their q = y/x forms, evaluated at
q = 1. But then we note that by that
definition, DTEN has exactly the same structure as DTE, but with a different
constant N/4 instead of 1/2. And that it
then turns out to be the tangent line of BVLN justifies the
"definition" even more, since DTE (using N = 2) is the tangent line
of BVL2. But right now
we're dealing with (y - x) forms, not q-forms.
So let's go to: DTEN:
w = 1/2 + (y - x) N/ (4y)
OR: Best Fit:
(N = 1.82): w = 1/2 + (y - x)
/(2.2 y) XXX RMSE??? Here, M = 1/(2.2y) , for the best fit. PythN: w = 1/2 + (y^N - x^N)/ [2 (y^N + x^N) ] Oh, darn… Fixx...UP
use 1.0001y instead of y in the two factors that could =
0., and maybe even in the other denom. factor However, it is an IMPLICIT wpr, as y^N - x^N
DOES "factor" as (y -
x) (…etc…), but the problem is that
the remaining factor (…etc…) is an infinite
series when N is not a whole number.
The series obviously converges (except when y = x), since Pyth N gives (in
its normal form) perfectly good results, but the series will have no simple
form for M, and won't converge for quite a while (i.e., will need lots of terms
for accuracy.) So this does not yield
any "formula" that is worth working with in a wpr form. Of course, PythN is still a great wpe in its
other, simpler form, w = y^N/(y^N + x^N). And the form above shows that it can be
formulated as 1/2 + something; but it
can't be represented simply as a wpr--as a function of surplus runs, (y -
x). Xxxx why not use
M = 2/(x + y+ 2A), which is false average of 1/ x + y and 1/ 2A Kross Average! Separate issue: false average of 1/x + y and 1/1500, IS pretty good! but with 1625, not 1500 -- BEST??! Except for BVL a=br
_________________________________ 10. Pythagoras to Natural Logarithms via Integration: a New Run Estimator 10A) Let Pyth(n) be w = y^n / (y^n + x^n). Let G = total Games played, and n be fixed at some power we find "best". To formulate a Wins/Run function, we fix x, assume y = x has created w = .500, i.e., wG = (1/2)G, and then solve the equation wG = (1/2) G + 1 for y, and then for y - x. Doing so, we get y^n / (y^n + x^n) = (G/2 + 1) / G, or , cross multiplying and then collecting and factoring y^n, y^n (1 - [G + 2]/[2G] ) = x^n (G + 2)/(2G), or, dividing, simplifying, and taking the nth root, y = x ( [G+ 2]/[G - 2] ) ^ (1/n). Therefore, y - x = [ ( [G+ 2]/[G - 2] ) ^ (1/n) - 1 ] x = (extra) runs / additional win . Taking the reciprocal gives (extra) Wins/Run = 1 / (Kx), where K = ( [G+ 2]/[G - 2] ) ^ (1/n) - 1. 10B) Now assume a team has scored y runs and given up x runs. We ask how it accumulated extra wins during the process of accumulating the excess (y - x) surplus runs, since the Wins/Run changed at each step of the way. That is, our assumption above was that y = x, when we found Wins/Run, but as each successive extra run was scored, there is a higher y, and hence an "assumed" higher x, which will influence the extent to which the next incremental win will come from the (new) level of x and y.. So what we really need to do is INTEGRATE the Wins/Run function at every level of runs, from the x-value that really was what the opponents scored, up to the y that our team actually scored, accumulating "dw"'s (win increments) along the way for the varying Wins/Runs occurring at each different y. To do this, since y and x are now fixed total runs scored, and can't play the role of a variable, Integrate d[Wins/Run ] = Integral of [ 1 / (Kp) ] dp from p = "x" to p = "y". Since integral of 1/p = ln(p), and substituting the limits of integration x and y, We get: total "extra" Wins = (1/K) [ ln y - ln x ], with K as above. Divide these wins by G to convert to winning percentage, and add this "extra w" to the w = .500 level when y started (at x), and we get a new run estimator: LNW: w = (1/2) + (1/KG) [ ln y - ln x ] , where KG, for n = 1.82 (best Pyth n) and G around 160, yields 2.2 or so. Best LNW: w = (1/2) + [ ln y - ln x ] / 2.2 Rmse = .0259 [Note: 4 / (KG) = the Pythagorean exponent n we started with. This is not surprising when we look at tangent lines, as below.] This is quite an accurate estimator, according to my sample data. w = (1/2) + [ ln u ] / 2.2 = (1/2) - [ ln q ] / 2.2
_________________________________
WPR
= z = G(y^N - x^N) / [2 (y - x) (y^N + x^N) ] Or, even simpler, we can once again use the "false average approach" on the fractional parts of the two fractions that mis-estimate (one overestimating, the other underestimating). This is what we did in part
First
developed in the late 70's, vaguely mentioned in his 1978 Baseball Abstract,
and explicitly developed in his 1981 Abstract, BJ's (apparently misnamed, as it
seems to have nothing to do with logarithms) Log5 formula is an extremely
useful formula, stemming from basic probablility theory, describing how often
Team Y could be predicted to beat Team X in a series of head-to-head games, if
the only the info we have about them is their Winning Percentages (or,
alternately W/L odds ratios) against their respective entire leagues of
opponents. That is to
say, if Team Y meets Team X in the World Series, and we know that Team Y went
60-40 in the regular season in its league [winning percentage = .600 = 60/(60 + 40) = Wins/(Wins +
Losses)], while Team X went 55-45 (winning percentage = .550) , what is the winning percentage w that
we should "expect" in the World Series for Team Y as it plays Team
X? Define a team Y's Quality Qy as Odds
Ratio Wy/Ly, where
both Substituting in the Basic Axiom, we
get
w = Qy / (Qy + Qx) =
Odds Ratio for Y / (Odds Ratio
for Y +
Odd Ratio for X) Or: Log
5 : w
= Wy/Ly /
(Wy/Ly + Wx/Lx )
[Odds Ratio Version] BJ went
through a method similar to mine to arrive at his conclusions back in
1981. A discussion of this follows, but
can be skipped by those who aren’t interested in his method. There are
many different plausible models that lead to the same log5 result, which is the
result BJ gave in his 1981 abstract. But
his stated method there is almost identical to my "Quality" model
method in spirit. So in a very real
sense, Log5 is the original/quintessential "quality" model for all
winning percentage estimators. However,
BJ did not seemed to explicitly realize that the basic quality model can
underpin much more than log5. BJ said the
following in the 1981 Abstract: Assume
that Y as above has winning percentage j = .600 against its league opponents Z,
where Z is "all the opposing teams that Y played against, as instantiated
in whomever they chose to play in their games against Y only." Again, Z is conceptualized as a vast
"entire league" team Z that only plays certain of its players against
Y in any given game. Further
assume that an average team (roughly like Z here) has an arbitrary
"quality" level Qz of 1/2.
Then ask, "What quality level Qy would Y have to have, in relation
to Z's quality levelof 1/2, so that Y would win with a winning percentage j =
.600 against Z, the league average team, under
the assumption that in such a season series of games, Y and Z's wins are in
proportion to the ratio of their respective Qualities? That is, he
essentially asked, what quality level Qy would satisfy the Basic Axiom: Qy / (Qy + Qz) = .600, where Qz = 1/2 = .500 ? [He
asked this in English, and actually garbled (grammatically) the question, so it
is technically inaccurate--but it is clear from the immediately subsequent math
what he meant.] If you
substitute and solve Qy / (Qy + .500)
= .600, you get by basic algebra
Qy = .5(.6/.4) . James called this value the
"log5" of team Y. Note, however, that if he had
arbitrarily assigned a "quality level" of 1 (instead of 1/2) to team
X (the "league" opposition), he would have gotten log5 equals
1(.6/.4). Thus it is clear that it is
the .6/.4 that is the crucial result--he
has it multiplied by .5 only because he assigned an "average" league
team an arbitrary quality level of .5 In
my calculations, I always assign an "average" team an arbitrary quality measure of "1", if
possible, because it simplifies the math.
I do that in what follows. So, for me,
the "quality" of Y would in this case be simply Qy = .6/.4 = 1.5, compared
to an average team's stipulated quality
level of 1. The fact that for James his
"log5" was half of that was unimportant, and the extra
"1/2"s quickly canceled out of log5 as used by others in the
future. Still, it is clear from his
language that he explicitly conceptualized "log5" as a measure of
"talent" (that is, quality), relative to an average team talent
level. 13. List of all wpe’s considered 1) Pyth2 (BJ): w
= y^2 / (y^2 + x^2) 1A) Pyth(n)
http://gosu02.tripod.com/id69.html |