Mathematical Relationships Between
Winning Percentage Estimators (WPE’s) in Baseball
(This webpage is under construction, and is not intended yet to
be "live". )
(This webpage is under construction, and is not intended yet to be "live". )
In Sabermetrics, the mathematical study of baseball, there are many ways to estimate what a team’s winning percentage (w) should be against its opponents, given the total runs (y) it has scored and the total runs (x) it has allowed in a certain number of games (G).
Bill James’s Pythagorean Formula w = y^2/(y^2 + x^2), which I call "Pyth2", was the most famous early example--simple and pretty accurate; but many other wpe's have been found, though they only improve average accuracy by a small amount. My purpose here is to analyze mathematical relationships between them, and to show that they quite often have a common model and structure, which is related to that of Bill James's Log5 Formula, though Log5 uses different input (not runs) to estimate the same w. My efforts were first motivated by trying to understand, over many years, why Pyth2 is actually true, and from what basic principles it can be derived. I finally figured that out, and it led to the more general model.
What Log5, Pyth2, and most other wpe's have in common is what I call the "Quality Model", which expresses the mathematical formulas in terms of a very simple model based on plausible measures of a baseball team's "quality", i.e., on quantitative statistical measures of the degree to which its team members' skills and abilities are better (for baseball purposes) than another team's. These are most plausibly expressed as functions of the runs it scores and gives up. This analysis also leads to some new wpe's I have come up with, but my work merely reshapes and extends that of others, who deserve credit for very insightful original analysis--see the "Credit" section at the end. I am a math professor and baseball fan, but get interested in technical sabermetrics only sporadically, don’t follow the field regularly, and hence may well be unaware of others’ results. If I haven’t properly credited someone, please let me know and I will remedy that. People who will find my work interesting may be few and far between, but those who aren't into the math details may still appreciate some of the overall relationships between seemingly unrelated baseball prediction formulas, and the "quality" concepts on which they are based.
It is also
helpful to write v as s/r
, where s = (y - x) is the surplus runs (of the team as compared
to its opponents, with s being negative if it is outscored by its opponents),
and r = (y + x) total runs scored by both teams together.
This is not true of r or s--neither uniquely determines the other, nor any of u, q, t, or v. And vice versa--knowing u, q, t, or v does
not determine r or s. , though it does determine their ratio.
, though it does determine their ratio.
has a formula for a potential Winning Percentage Estimator (wpe), one wonders
how accurately it applies to real teams in Major League Baseball. I measure wpe’s against a roughly 1260
team-season sample of MLB from 1903 to 2010, with an overall w of .50000 in the
My sample uses only team-seasons with 149 games or
more, which improves the accuracy of all wpe's over what they
would be if used on shorter seasons. I use Excel to measure the root
mean square error (rmse) in the predicted winning percentages as compared to actual
results—but I don’t then convert to “Wins”, as many analysts do. Since
my sample isn’t all MLB team-seasons, and leaves out all shorter
can’t tell which wpe's are “really” most
accurate—but the general relationships I find agree pretty much
with accuracy info about wpe's I've found online from other
sources who do use all data. I
just want (and use here) a rough estimate of which measures (including my newfound
ones) are substantially different from others.
As a benchmark, BJ’s Pyth2 has for my sample a rmse of .0264 (the difference between a w
of .500 and a w of .5264, or around 4 wins in a 162 or 154-game season), and the lowest rmse's for other formulas are a couple at .0256, and one at .0255. The “worst” (still
halfway-decent, and often quite accurate for most "normal" teams) ) wpe's I consider can get up to rmse of about .031.
The .026 or .025 rmse level is almost certainly the best possible, since
there is an ineradicable element of random chance in baseball.
And, of course, these differences in accuracy have little
pracical importance, since they represent a tiny fraction of an extra
predicted "Win" on average, for normal teams. How well various
wpe's predict extremely good or bad teams' records is a different
issue, which I don't address much here.
And, of course, these differences in accuracy have little pracical importance, since they represent a tiny fraction of an extra predicted "Win" on average, for normal teams. How well various wpe's predict extremely good or bad teams' records is a different issue, which I don't address much here.
More Notation: y
= runs scored by your team x = runs scored against your team, by its opponents, i.e., runs
allowed by your pitchers
Contents: Note that the penultimate section is to be a
concise list of all the various wpe’s I consider.
Boldface Contents are almost complete. Others in various stages of construction.
Boldface Contents are almost complete. Others in various stages of construction.
1. Quality, Wins, Runs, and
Quality, Wins, Runs, and
Some baseball teams are
better than others. This difference in quality arises from the batting and pitching and
defensive skills of their players. These
skills lead to runs scored and allowed, and those lead to Wins and Losses.
We will define a “measure
of quality” Qy for a team Y as an appropriately chosen, analytically plausible, statistical result of its play, involving bases, outs, runs, Wins, or other measures. [But we will assume it involves runs, unless otherwise stated.]
Is it true that when two baseball teams play each other over a long series of games, they will tend to win in proportion to their respective quality measures Qy and Qx?
That is, if one team has k times as much quality as another, will it tend to win k times as many games? The answer is (roughly) yes!
This holds up for many different plausible
measures of quality, including u = y/x.
note that this is not true for all
sports! It is only true in baseball
because there is just enough random chance involved in the game to make
it true. This random chance comes from many factors,
but one major one is that a baseball team's "quality" is a (weighted)
average of that of all its players--but not all players play in any given
game. This is especially true of
pitchers--the team's 5th best pitcher may be pretty bad, even though
the overall average quality of the team may be better than any opponent's. When the 5th best pitcher pitches, the team
has a good chance of losing. Or, when
their best pitcher pitches, but happens to pitch by chance against the only
good (but very good!) pitcher the
opposing team has, the team may still lose.
arm-wrestling, there is little role of chance:
if I am twice as strong as my opponent, i.e., if my “quality” of
arm-strength enables me to lift 200 pounds while she can only lift 100, I will NOT
win twice as many of my matches with her as she will—instead, I will always
win, practically speaking, for an infinite ratio of my wins to her wins. This fact could be altered if a greater
element of chance were artificially introduced into arm-wrestling--say, if a
muscle relaxant were randomly administered to one contestant before each
match. Then whoever got the relaxant
might often lose, no matter how weak their opponent. But barring that sort of thing, there is not
enough chance in arm-wrestling to result in contestants winning in proportion
to their quality.
And in basketball, the Bulls team that went 72-10 did not win in proportion to its relative excess of quality over its opponents. It was nowhere near “7 times as good” as its average opponents—not by any measure (shooting percentage, speed, strength, points, rebounds, steals, free-throws, etc.), nor even by the total sum of its (small) excesses in each category. But, as in arm-wrestling, there is less “chance” in basketball than in baseball, so even a modest surplus of basketball talent over one’s opponents much more often allows the team to demonstrate that surplus by winning the game. This is because (basically) the "good" players on the team always play in every game, and especially near the end, when close games are decided--Michael Jordan was rarely "by chance" not around to help determine a game's outcome when needed.
However, the fact that this proportionality of wins to quality IS roughly true in baseball, under many different plausible measures of quality, leads to a basic model for many different sabermetric formulas, which may at first seem ad hoc and unrelated.
Note that “quality” is inherently a relative notion. If one team has quality 3 while another has quality 6, it should make no difference if instead we measure the first team’s quality as 10 and the second one’s as 20. But there is an obvious way to “normalize” the measures: make sure that the “quality” of an average team (which is predicted to win half its games, with w = .500, and with x = y) is equal to a constant: and 1 is certainly the best constant.
I will do this normalization for some quality measures, particularly ones based on the run ratio y/x: if y = x, then Qy = 1 = Qx. However, since this "normalization" is not at all necessary to make the model work, and is merely an "aesthetic" feature, I will not always do it. One can always force any quality measure into a different, normalized form that yields the same wpe via the Axiom, but when the logical features and rationales of relative quality measures are already apparent, even though it isn't "normalized" to 1, I will often not bother to do so, since doing so can introduce extra mathematical cumbersomeness, and never changes the final Quality Model results of predicted "w".
BASIC AXIOM of the Quality Model:
A baseball Team Y's winning percentage w against Team X is well-predicted by the following model:
For a given (plausible) measure of quality, with team Y having quality called "Qy", and its opponent team X having quality called "Qx",
= Quality of Y / [Quality of Y +
Quality of X ] = Qy / (Qy + Qx)
The Basic Axiom simply says that teams are predicted to (and will, roughly, actually) win in proportion to their measured "Qualities")
baseball, there are many quality measures for which, when team Y has “k” times
the quality of its opponent according to that quantitative measure, it will
indeed generally win “k” times as many games in their matches. That is, baseball
(roughly) obeys the Basic Axiom, and a plethora of roughly equally accurate wpe’s demonstrate this in their common
For various mathematical reasons, it may sometimes appear that implausible Quality measures give an accurate wpe--in which case the goal is to find a different, plausible quality indicator that yields the same formula.
For Example: Bill James's (BJ's) Pythagorean Formula with N = 2 says that the winning percentage
for a team Y in a league is: Pyth2: w
= y^2 / (y^2 + x^2)
certainly a pretty good predictor, as wpe's go.
Here y is runs scored by Y, and x is runs allowed by Y, i.e., runs scored by its opponents (X). This result would trivially follow from the
Basic Axiom IF we chose runs^2 to be the the quality measure of each team,
respectively. That results in y^2 being
the quality Qy of Team Y, and x^2 being Qx, the quality of the agglomerated
league opponent Team X when playing against Y.
But that choice is NOT
intuitively plausible--I wondered for 28 years WHY the runs-squared were used! Why
aren't just runs themselves (without the squares) indicators of team quality, and hence predictors of winning percentage? What do the squares have to do with it?
What do the squares have to do with it?
would be that if Qy = y and Qx = x, the resulting formula via my Basic Axiom
would be w = y / (y
+ x), which has been shown to NOT predict real-life team winning percentages
very well. Pyth2 predicts w pretty accurately--refinements
and alternate formulas never do very much better than Pyth2. But, in fact, there is a much more plausible
indicator of team Quality than y^2 and x^2, one which leads (via my Axiom) to
the Pyth2 result, with the
squares. The squares are not a
natural way in which to measure the "quality" of a team—they simply
result from the way in which the Axiom applies to the “real” (more plausible)
quality measure. We will see this below.
James Log5 Formula
developed in the late 70's, vaguely mentioned in his 1978 Baseball Abstract,
and explicitly developed in his 1981 Baseball Abstract, BJ's Log5 formula (apparently misnamed, as it seems to have nothing
to do with logarithms) is the
quintessential and original Quality model—he derived it with this concept explicitly in mind. He set an average team’s quality equal to ½,
instead of 1, as I do, but otherwise his approach is the same. In this sense it is the forerunner and
exemplar of most runs-based wpe’s—especially of Pythagorean results. (This is true even though it predicts
head-to-head w from league W/L ratios, NOT from runs—but the structure is the
BJ tried in
his 1981 Abstract to “prove” this relationship between log5 and Pythagorean
results, but unsuccessfully—his claims in the 1981 Abstract about how the two
are related have a fundamental error.
But the formulas are nonetheless very related, as I will show soon. [A discussion of BJ’s original development
(with the error) in the 1981 Abstract is presented further down the webpage.]
Log5 is an extremely useful formula
describing how often Team Y could be predicted to beat Team X in a series of
head-to-head games, if
the only the info we have about them is their Wins/Losses odds ratios against their respective entire leagues.
That is to
say, if Team Y meets Team X in post-season play, and we know that Team Y went
60-40 in the regular season in its league, while Team X went 55-45 in its
league (perhaps the same league, perhaps not), what is the winning percentage w that
we should "expect" in the postseason series for Team Y as it plays
This is a
very plausible definition of quality—the better a team, the better
should be its odds of winning against the entire league overall. In fact, when two teams do play each other,
their actual W/L odds ratio in that series, after they finish playing each
other, is pretty much our best definition of their relative quality: if,
over a lengthy series, Y won twice as many games against X as it lost, we would
intuitively say: “Y is twice as good as X”. Again, this is true in baseball, but not
in arm-wrestling or basketball. However, such "intuitive"
reasoning runs the risk of being circular: defining quality by
win ratios is begging the question. But the potential circularity
the reasoning is avoided once we see that other intuitively plausible
measures (particularly involving runs, not wins) also lead to similarly
good results using the Quality Model.
Substituting in the Basic Axiom, we get:
= Qy / (Qy + Qx) =
Odds Ratio for Y / (Odds Ratio
for Y +
Odd Ratio for X)
Or: Log 5 (Odds Ratio
Version) : w = Wy/Ly
/ (Wy/Ly +
So, when a 60-40 team plays a 55-45 team, it should have predicted winning percentage w of: w = (60/40) / [60/40 + 55/45 ] = .551 .
discovery by BJ has long been shown to be a very useful and accurate result. (It was also immediately transformed
mathematically into an equivalent version using inputs of team Y’s and team X’s winning
percentages against their leagues, rather than their odds ratios. But the odds ratio version above is the
important one for other wpe’s.)
again use the Basic Axiom and quality model, as we did for Log5, but this time
we will put in a different Quality measure, one involving runs, not wins
BJ’s Pythagorean Formula for power 2 (Pyth2) says that winning percentage w is given by (with y = runs scored by Y, and x = runs allowed)
Define Qy, the Quality of Team Y , as Qy = y/x.
That is, the quality of Y is measured by the "run ratio" of
its runs to its runs allowed. This is
certainly an intuitively plausible measure of the quality of a team. Note that
we can consider Y’s opponents to be a union of daily subsets of a vast
“average” “league” team X, who only play some of their players against Y
in any given game. That is, Team X is
considered as a union of “sub-teams for a day” for each day that some of their
players played Y.
So the Quality of Team X is also defined by its runs ratio against Y: its runs scored against Y (x) to the runs it gives up (i.e., y, the runs Y scores against X.)
That is, Qx = x/y
Plugging the above Qy and Qx
into the Basic Axiom, we get w =
Qy/ (Qy + Qx) = (y/x) / [
(y/x) + (x/y) ], or clearing fractions, multiplying by xy in top and
bottom, we immediately get Pyth2. QED.
So this is why the "squares" were in the Pyth2 formula--I was happy to see it after 28 years of wondering! It is this sense in which Log5 and Pyth2 are structurally the “same” formula, as BJ tried unsuccessfully to show in 1981: they both stem from plugging in a quality measure to the Basic Axiom, with log5 using W/L, the odds ratio, as the quality measure, while Pyth 2 uses y/x, the run ratio. Since runs scored tend to increase wins, while runs given up tend to increase losses, the similarity of the results is wholly natural and plausible—but note that the “squares” aren’t in the measure of the quality; rather, they come from the mathematical structure of the Axiom, and aren’t “ad hoc” (as they seemed to be for BJ in 1981, though it's hard to be sure.)
I get Rmse
for Pyth2 as .0271, which will be the “pretty good” benchmark against which to
measure other wpe’s.
It was rapidly realized, even by BJ in 1981, that Pyth2 could be made more accurate (a little bit) by using a different power n, instead of n = 2—he claimed that n = 1.83 was best, and that has held up as roughly correct since then, though it differs slightly depending on the data one tests against. I personally get the best fit at around 1.85, but my data is only a sample, so I'm going to stick with 1.82.
I’ll use the “best” exponent as 1.82 . And I use "N" in the title of the formula, but "n" in the calculations--but they are the same.
And I use "N" in the title of the formula, but "n" in the calculations--but they are the same.
PythN: w = y^n / (y^n + x^n) [ Best fit to real data: n = 1.82
] (Rmse = .0258)
this from the Quality model by using:
Qy = (y/x)^(n/2),
and Qx = ((x/y)^(n/2) , and
substituting into the Basic Axiom.
Conceptual Problem: A Digression into "Effective
But now we ask, why is it plausible that Quality would be proportional to (y/x) raised to a weird decimal power (n/2), rather than proportional to the intuitively plausible y/x ?
Well, it's not really plausible--baseball does NOT
in fact quite follow the quality model.
But it does follow it approximately,
and closely enough that the model still shows why many wpe's are structurally
Note that for the “best” results,
n/2 is about 0.91,
which is to say that it makes (y/x) a tiny bit smaller when y>x, and thus
makes (x/y) a bit larger in that situation.
That is, it moves each run ratio
back towards 1 by a tad, and hence moves the predicted w back towards .500 by a tad. BJ in
1981 grasped the import of this fact--historically, MLB baseball has had just a
little more chance in it than the amount required to produce a
"perfect" proportion between quality and winning. As we saw for arm-wrestling, introducing an extra
element of chance into the game alters the degree to which "real" quality
differences will manifest themselves in winning percentages. Thus, the more chance involved, the closer
the resulting winning percentages are to .500, no matter what the real quality
differences between the contestants.
One might also phrase this by saying that a small "extra" element of chance combines with a team's "real" quality to create its “effective” quality, and that it then wins in proportion to its effective quality. But, though this presents a handy way of discussing things, it is essentially circular reasoning--then "effective quality" simply becomes "whatever correlates best with winning", which is NOT how we want to envision a model that uses "real" quality and the Basic Axiom.
Note, crucially, though, that
how many total
runs (or points) are scored in an average game (RPG) affects the
degree to which “chance” influences winning..
When lots of runs are scored, there is more opportunity for the real
“quality” (which is what produces runs) to manifest itself, and less
opportunity for chance to make an actual difference in the outcome of the
game. And vice versa.
is well known in sabermetrics, and leads to wpe’s that include RPG = (y + x)/G in them, by which the greater the
RPG, the more the resulting w is due to a magnified
effect of the "real" quality ratio upon the winning percentages. Again, this is true in basketball, which has
a couple hundred points scored per game, and where w = (y/x) raised to a large
power. We will look at wpe’s involving
RPG below—but note that while the MLB RPG has varied significantly in different
eras, it has historically been in the 8 to 10-ish range, which is sufficiently
low that apparently chance plays a significant enough role to make the ideal
exponent in Pyth(n) a bit less than n = 2.
effect of chance at historically average MLB RPG levels means that the
isn’t “really” true—rather, it’s true for
“effective” quality, as opposed to “real”
quality--but it’s close enough for horseshoes, and for MLB
baseball. Again, the difference in accuracy between
Pyth2 and Pyth1.82 is very small, on average—a small fraction of a win per team per
year. For almost all purposes of measuring MLB w’s,
the Basic Axiom is roughly true And this is shown by the fact that almost all
wpe’s (of varying accuracy) can be shown to fit it quite plausibly.
does mean is that many Quality measures result in wpe’s that can be slightly improved in predictive value by
using some method of “moving them back to .500” just a tad—again, as
realized by BJ in 1981. That is, by
various means, one tries to establish the "effective" quality of a team,
including an effect of chance. This led
him to what I call:
w = [ y^2 + K ] / [ y^2 + x^2 + 2K ] Best fit: K = 38000 (Rmse = .0255) checked
Adding any constant in the numerator and twice the constant in the
denominator moves the overall result a bit closer to ½. BJ used K = 60,000, but I find with my data
that the best fit is around 38,000.
Doesn’t matter too much…
the same effect on overall accuracy as changing the 2 to 1.82 and not using any
Qy = (y/x) + K/(xy) , with Qx = (x/y) + K/(xy)
Intuitively, this plausibly says that a team Y’s “effective” quality is its run ratio plus an “extra chance” factor, K/(xy), which is the same for both Y and its opponents X. The larger the total runs per game, the larger will be the denominator xy in the chance factor, and thus the smaller will the extra chance factor be.
Plugging those Qy and Qx into the Basic Axiom and clearing fractions, multiplying top and bottom by xy, we get PythToAHalf immediately. QED.
Note that this gives us a (crude) way to measure the "extra" chance in baseball, beyond that which would enable teams to win in proportion to their "real" quality: at, say x = y = 750 per season, and K fitting best at about 49000, we get an average chance factor of around .067, or let's say 7% extra chance beyond what would produce a winning percentage in proportion to quality. Of course this changes when x and y are lower or higher, and 49000 is probably not be the real best fit.
BUT, from now on, we will NOT measure or model "effective" quality any
more, as, again, the real "quality"measures are still good predictors, and it
is the mathematical relationships between the various fundamental wpe's
in which I'm interested, not ad hoc tweaks that produce minutely better accuracy.
3. Bill James's "Win Shares"
Marginal Runs WPE
Define A = Average Total Runs Scored per team for a league in a given year.
3A) Bill James
Marginal Winning Percentage:
w = [y - (1/2)A + (3/2)A - x ] / [2A]. (Rmse = .0272 or better)
Note: s = (y - x) is often called Run Differential in
the saberrmetric literature, though I prefer to call it the surplus.
Derivation of the formula BJMWP from
Quality of a team Y be: Qy =
[ (y - x) + A] / A Similarly,
Qx = [ (x - y) + A ] / A
this says that any team Y, no matter how many total runs it scores, that has a surplus
of runs scored over allowed of (y - x) has Quality, relative to that of its
opponents, in roughly the same ratio that
a near-average team with the same amount
of surplus runs s would have to its average opponent.
That is, the
"near-average" Team would normally have A runs, but its excess run
differential would now give it (y - x) + A runs. An average opponent would have A runs.
fairly plausible--in a league where average total runs per team are 700, a team
that scores 760 and gives up 740 should be of roughly similar overall quality
to one which scores 720 and gives up 700, as should a team that scores 669 and
gives up 649. In BJ's formula, all are
assigned the quality of a team with 720 and 700 runs, for simplicity.
gets the same result by assuming that Qy = A / (A + (x – y), i.e., that the
quality is similar to a situation in which Y scored an Average amount of runs
A, and X scores A + (x – y), i.e., below average, if y > x .]
Putting the Quality measures above
into the Basic Axiom,
we cancel the common denominator of A that is in all terms, and get:
w = (y -
x + A) / [ (y - x + A) + (x - y + A) ].
I don’t actually have data on A handy, so I checked it against various constants in place of A, and for A = 750 got “best fit” rmse of .0272. Thus, it is at least as good as Pyth2, which is explicitly why BJ used it for Win Shares. But of course it is undoubtedly more accurate using A for each league/year separately.
3B) I decided to modify BJMWP as follows, to get
a new wpe: BJMWP (Symmetric)
a difference of runs (y – x) is similar to a situation where each team is a
distance of half of that difference above or below an Average Team. That is, assume that it is as if Y scored A +
(1/2)(y – x) runs, and X scored A + (1/2) (x – y) runs.
the same Quality measure as in the Pyth2 case:, i.e. the run ratio between the
= [ A + (1/2) (y – x) ] / [ A + (1/2) (x - y) ] ,
Qx = [ A + (1/2) (x - y) ] / [ A + (1/2) (y – x) ]
This, of course, leads to the Pyth2 result with the runs for each team squared: (first, double each runs amount for simplicity):
w = (2A
+ y – x)^2 / [ (2A + y – x)^2 + (2A
+ x - y )^2 ] or, squaring and simplifying,
BJMWP (Symmetric) w =
(1/2) + (y – x)
2A / [4A^2 + (y – x)^2 ]
This is another
wpe using the run differential s = (y –
x), and will be discussed further in that section.
Not in spreadsheet. Nor is the next one. No RMSE yet.
3C. Bill James's 1960’s Winning Percentage Model
(3 runs per game per Team) (BJ1960)
BJ1960: w = (y/G – H) / ( y/G+ x/G – 2H) for some constant H. BJ used H = 1.5 . G = Games I find the "best fit" over my large sample of games and years is H = 1.75. But that is not very accurate, with Rmse of .0287.
This is to be expected, as the "constant" used can be plausibly
expected to vary in different offensive contexts or years. But
the formula is not "primitive": it is one tweak away from being
I find the "best fit" over my large sample of games and years is H = 1.75. But that is not very accurate, with Rmse of .0287. This is to be expected, as the "constant" used can be plausibly expected to vary in different offensive contexts or years. But the formula is not "primitive": it is one tweak away from being much better:
If for H
one used not a constant, but (1/2) ( y
+ x)/2 / G , i.e.,
one used half of the average single team
runs/game in Y’s games, one actually gets Ben Vollmayr-Lee’s Linear Prediction
model "BVL2" (see next section.) A very nice relationship!
That is, if Qy = (y - A/2) / (y + x - A)
anticipates (by decades!) the very plausible idea of
Bill James's later Win Shares notion:
4. Ben Vollmayr-Lee's Linear Prediction Model
In many ways, the "BVL2" formula coming up below is the most elegant single wpe--it is derivable in many beautiful ways from many other, seemingly unrelated wpe's, and is extremely simple and pretty accurate, especially for a "linear" model.
Ben Vollmayr-Lee (BVL) wants to measure predicted winning percentage w from Y's fraction "t" of total team runs (for and against),
"average" team, where y = x,
and thus t = 1/2, the winning percentage should be .500
Thus, a linear predictor: w(t) = mt +
b should go through (w, t) = (1/2,
1/2). Here, m = slope, as usual.
It could thus also be expressed (as BVL prefers) in point-slope form as w - 1/2 = m(t - 1/2) , which intuitively relates the change in w from .500 proportionally to the change in fraction t from 1/2, i.e., to the change from the fraction for an average team. The proportionality constant would be the slope m. However, I prefer to do the math with the form w = mt + b, as follows:
BVL notes that a good fit to real life results occurs for w = mt + b , when m = some number N (conceptualized as an arbitrary Pythagorean exponent N, about which more later), and b = (1 - N)/2 . These conditions do indeed lead to (1/2, 1/2) being on the prediction line.So here is the BVLN Linear Prediction Model: w = N t + (1- N)/2 This will be accurate for various N's near 2
w = Nt + (1- N)/2 , with N = 2, yields: w = 2t + (1 - 2)/2. or, simplifying:
= 2t - 1/2 . (for
m = N = 2) Note that this is linear in his preferred variable, "t".
Note that this is linear in his preferred variable, "t".
expressing in terms of runs y and x, with t
= y / (y + x) ,
BVL2: w = (3y - x) / (2x + 2y) This is not how BVL puts it, but it's an
equivalent and simple formula. Of course, it's no longer linear!
Of course, it's no longer linear!
Let YQuality be y +
(y - x)/2 , with XQuality
therefore x + (x - y)/2 .
Intuitively, this says a team's
proportional to the following sum: the runs y they score, plus half the
"s" marginal runs by which they exceed (or fall below) their
opponents' runs. This is certainly a
plausible indicator for the relative quality of a team--it includes (not just)
the runs they score, but also an additional term proportional to how much their
runs exceeded their opponent's runs, where the proportionality constant is
Plugging in to the Basic Axiom,
4B) Back to BVLN, as above: Now let N be any arbitrary
exponent. And replace t with y/(y + x):
= N [y/(y + x)] + (1 - N)/2
or, simplifying, w =
Ny / (y + x) + (1 - N) / 2
which, using the LCD, gives us:
= (2Ny + y + x - Ny - Nx) / 2(y
+ x), or, simplifying for a couple steps... w =
[y + x + N(y - x) ] / 2(y + x) , or:
BVLN: w =
(N/2) (y - x) / (y + x) = 1/2
+ (N/2) v
Here is our
first natural occurrence of the parameter v = (y - x)/(y + x).
Note that BVL2 in this form becomes: w = 1/2 + (y - x) / (y + x) , or w = 1/2 + v . A very simple (linear) form, and pretty accurate!
Derivation of BVLN from Quality measures:
= y + [ (N - 1)/2 ] (y - x) , with Qx
= x + [ (N - 1)/2 ] (x - y)
this again says that a team Y's Quality can be measured by the following
sum: the runs y they score, plus
a proportionality constant times the
"s" marginal runs (y - x) by which they exceed their opponents
Here, the "general" proportionality
constant is (N - 1)/2, instead of 1/2, as in part A), and thus N can be
adjusted here to "best fit" the real-life data.
shows that whatever N fits the data will also be the exponent in the Pyth. Run
Formula that also best fits the data for that model--via a calculus
tangent-line relationship, which will be discussed below. In fact, the "best
fit" to real life data says that N should equal around 1.8, not 2.
w = ( y
+ [(N - 1)/2] (y - x) ) / [ (
y + [ (N - 1)/2 ] (y - x) ) + ( x
+ [ (N - 1)/2 ] (x - y) ) ]
So we look for
wpe's of the form w = f(q). The smaller the value of q, the more games
the team should win.
secondly that, since by definition the team's
winning percentage equals ( 1 - its opponents' losing percentage), and
since its opponents have the roles of y
and x reversed for them, we must
have f(1/q) = 1
I will call
any such function w =
f(q) = 1 / (1 + q^n ) a "Power Run Function", or PRF. Many baseball
analysts are very interested in the "best" exponent n, the most
accurately predictive one.
w = 1 - x/(2y) . This is a mediocre winning percentge estimator and, I found, had been discovered as a
formula by Bill James long ago (not sure how his conceptualization
went), by whom it was named: Or, using x and y, w = y/(2x) . I call this Kross2, using Kross1 for DTE. This creates a "piecewise" function, with eac piece having a different domain, which I call simply "Kross", made up of Kross1 and Kross2:
Or, using x and y, w = y/(2x) . I call this Kross2, using Kross1 for DTE.
This creates a "piecewise" function, with eac piece having a different domain, which I call simply
"Kross", made up of Kross1 and Kross2:
Kross: DTE = Kross1: y > x: w = 1 - x/(2y) = (2y - x)/(2y) OR: q >
1: w = 1 - (1/2)q DTE------------rmse = .0307
both parts give a value of 1/2 at q = 1, as they should. The nice
though, is that for large values of q = x/y, which is to say teams that
a lot more runs than the small (but non-zero) number they score, the
percentage does not become exactly zero (or negative!), as it would
but as it would NOT in real life. I don't have my spreadsheet set
up to evaluate Kross by using the different pieces as appropriate for y
< x or y > x, so I can't tell how accurate it is, but it is certainly likely to be a little more accurate than either piece separately.
But, I thought, why not get a single function from the 2 pieces? That's what I did by simply averaging them, in:
This turns out to be a fairly accurate estimator, substantially better than either piece separately.: It has some other versions, such as :
w = (1/2) + (y - x) M, where M) is a "winning percentage per
run" estimator (see section 9), with M = (y + x)/ 4xy
6C) But in fact, it is better to "average them" incorrectly!
If we use this on Kross1 (DTE) and Kross2, and
writing Kross1 as (2y - x)/(2y), with Kross2 as y/(2x), we
get that the "false average" is
Kross2: w = 1/2 + (y - x)/2x
BJMWP: w = 1/2 + (y - x)/2A
BVL2: w = 1/2 + (y - x)/(y + x)
6D) Can We Get Pythagorean Results?
is that "p"
doing in there...weren't we using "q" as the variable!?
Again, this is perfectly
"Pythagorean" in form, just using the "shifted" run
Oh, well...definitely worse than Pyth 2. But, like all good failures, it suggests an improvement--one which is better than Pyth 2!
The reason it does worse than Pyth 2 is that, as we saw earlier, Pyth 2 predicts results (based on a given run ratio q) without sufficiently taking into account the effect of chance, which moves w a tad closer to .500 than would be predicted by Pyth 2. That effect means that a run ratio value a little closer to 1 predicts more accurately than the actual run ratio does. But subtracting ½ from X and Y moves the run ratio farther away from 1, not closer—that is, it further reduces the role of “chance”, and is hence even less predictive.
But why not
move in the opposite direction? Why not increase
Y and X, which moves the q ratio closer to 1, and should thus account more for
chance? Hence, let's use X+ 1/2 and Y +
1/2, and see if we get better predictive results: and we do!
[Of course, this is now an “ad hoc” adjustment, not derived from my
model above—but it’s an interesting one, and is very accurate!]
Also, let's no longer treat Y and X as fixed runs per game, but as variables, and as total runs per team--PRF's have no concern for which variable is used, total runs vs. runs/G, as the G’s cancel out. So we get the much more accurate new estimator:
with y = total runs in a season for team Y, and x = total runs in a season allowed by team Y,
6F) Pyth+1/2: w = (y/G + .5)^2 / [ (y/G +
.5)^2 + (x/G + .5)^2 ] [Rmse = .0264]
we can get it just slightly more accurate using a best fit of adding
0.43, not 0.5 , giving me a rmse of .0256 , which is
Actually, we can get it just slightly more accurate using a best fit of adding 0.43, not 0.5 , giving me a rmse of .0256 , which is extremely accurate!
What is interesting about SPR and Pyth+1/2 is that they come from or are suggested by a VERY simple run model, quite unrealistically so in the "uniformity" feature, that is totally unrelated to the "quality' derivation of Pyth2, and yet yields the same structural Pythagorean result with exponent n = 2, and, in one case, extremely good accuracy, even better than Pyth1.82.
6G A minor footnote: What happens under the contrary assumption to that in part 6d) above? That is, what happens if we assume that y - x > 1/2 ? The geometry changes a bit, and one gets a slightly different result, with an extra "adjustment" term added to the numerator and denominator of SPR. That extra term is (y - x)^2. Since this involves the surplus s = y - x, but not the ratio of y/x, (or of their shifted values) this is no longer easily expressible as a function of some "p" variable. And it is definitely a bit more complicated--although for most real-life values this extra term is quite small, and thus the final results are quite similar to the original f(p). But at any rate, this seems to have too high a ratio of complication to accuracy to pursue.
If n = 1.82, we get w = .955 - .455 q, or w = .955 - .455x/y. This should be the "best fit" linear wpe based on q.
RMSE = ??? xxxx
7A) DTE: A Quality model w = 1 - (1/2)x/y,
or w = (2y - x) / 2y
This simplifies to : w = (y - x + A) / (2A}.
w = y / (2x) . But this is just Kross2 ! See section 6B). Hence, Kross2 also comes from the simple Quality model of BJMWP.
One can also obtain both DTE and Kross 2 from a "General Marginal Runs" formula" I've found by playing around with the above idea,
GMR: w = (y - A/2) / (y + x - A) , RMSE: XXX ??? choose a best fit A! see end of sec. 3 = BJA
in which if, reversing the substitutions we used above for BJMWP, we replace A by y we get Kross2, and if we instead replace A by x we get DTE.
Note that GMRis is saying that the winning percentage is simply the ratio of the excess of runs scored by Y over half the league average, to the surplus of total runs in Y's games (scored and allowed) to half the league average (of total runs for both teams, which would be half of 2A, or A.)
7B) Tangent lines, q, u, v, t, Inflection points, etc.
and y q
w = 1/2 when: Range:
(1 - t) / t
(1 + v) / (1 - v)
q = 1
[0 , oo)
Note that all but Pyth2 are linear in at least one version: BVL2 is linear in t and v, DTE is linear in q, and Kross2 is linear in u.
And, amazingly, in each linear case, the linear formula is the (calculus) tangent line of the other three wpe's in its column!
The same is true for the "q" column: DTE using the q formula is the tangent line to Kross2, BVL2, and Pyth 2 in their respective q-versions.
More generality: Each of these above formulas comes in a variation where in stead of using n = 2 in Pyth 2, we could use a "better fit" exponent n. This is generally taken to be around n = 1.82 for best fit to real results. I'm going to use capital N in the titles, but small "n"s in the formulas: This creates
BVLN, PythN, DTEN, and Kross2N: For these, which are more complicated, I will give only those paramter's versions which are fairly simple:
PythN: y^n/(y^n + x^n) = 1 / (1 + q^n)
BVLN: [(n + 1)y - (n - 1)x] /
(2x + 2y)
(1 - n)/2 + nt
= 1/2 + (n/2) v
9. Wins Per Run Estimators (WPR's)
xxxxTango Tiger , BVL, Palmerr
This section is particularly indebted
to Patriot, including his excellent work at ______ online, and I thank him for
several gracious and helpful email exchanges.
My work merely reclassifies and extends his work a bit, as well as that
of others he has cited (and I'm sure many he has not--it is very likely that
that stuff I've discovered independently may have been discovered by others
Wins per Run estimators (wpr's) usually rely on the concept of the surplus runs that your team scores over what its opponents score, i.e. on the quantity (y - x). Called the "run differential" in much sabermetric literature, I prefer to label it surplus runs and, if needed for simplicity, to use as its parameter s = (y - x). By itself, this parameter means little, since your team's surplus runs affect its ability to win very differently depending on the actual value of y and/or x, and hence of (y + x), none of which are determined merely from merely knowing the result of (y - x).
General form of a wpr
is: w = 1/2 + (y - x) (M) ,
Thus, we can write (or conceptualize) in general, w
= 1/2 + (y - x) Q/G , noting that (y - x) Q will be "surplus runs" times
"Wins per surplus run", or "total (surplus) Wins" , over
and above the wins that would result if there were no surplus runs, which are assumed to be 1/2 of the games G. Hence, dividing (y - x)Q by G will turn those
"surplus wins" into a "surplus winning percentage", to be
added to the initial 1/2 that obtains when y – x = 0. A
simple and extremely fruitful model, since M = Q/G can take many plausible
It has been found that plausible and accurate models covering the history of baseball result from using M values or functions that average out to roughly 1/1500 or so, since the average Wins/Run "Q" factor has been around 1 / 9.5 or so, and Games have long been around 160 per season or so. Using an M(x,y) based on individual team runs x and y (or League Average runs A) will of course create non-constant M-values with a fair amount of variation around the rough mean of 1/1500. Non-constant M’s should of course fine-tune and increase the predictive accuracy.
Why does it seem plausible that Wins/Run should be around 1 / 9.5 (or 1/10, for simplicity) and is the fact that major league total runs per game have often averaged around 9.5 or 10 related to that? Here are two considerations that suggest it is plausible, and yes, they are related. Dealing with the second issue first, since every game results in one Win (for some team) , then of course, overall, and on average, in any league Runs per Win would equal Runs per Game, since Wins equals Games for the league as a whole. However we are actually interested in Surplus runs per (Surplus) Wins, above Wins = ½ G, and it’s not clear that this necessarily is identical with Runs per Wins overall. However, using Runs/Win = Runs/Game gives a pretty accurate wpr, so it’s obviously a reasonable rationale.
Moreover, just in general, 1
extra win (above .500) from roughly 10 extra runs (above y = x) makes sense intuitively, since for half the games in which those
extra 10 runs are scattered about, the team will already (on average) be a
winner, so those extra runs (half, or 5, on average) won’t produce any extra
the 5 scattered among
previously losing games, in the ones where only one of the extra runs
scattered, it wouldn’t produce a win (at most creating
temporarily a tie, but then unable to be won without another run
scattered into the same game.) So there are actually fairly few games which
are losses, and in which the results of randomly scattering 5 extra runs among them could actually
make a sufficient difference, given the
original margin of loss, to win an extra game. Getting this to happen once seems just about
right on average, though of course teams might get lucky or unlucky in where
the extra runs came. If you ran a
simulation scattering the 10 runs among a teams games by any reasonable
probability distribution, it’s extremely likely that the resulting average
would indeed be around 1extra win—and, again, the nice accuracy of wpr’s using the
rough value proves this. So roughly 1/10
Wins/Run is certainly “in the ballpark”, intuitively.
The simplest wpr of all: I call it W1500: w = 1/2 + (y - x)/1500 rmse = .0263
And a pretty accurate one! Here, M = 1/1500. This is slightly more accurate than Pyth2, and not far from Pyth1.82, which sort of makes you wonder why you need them: it don't get no simpler than this formula, which actually is linear in s = (y - x). 1520 or so is actually the best fit for my sample, among all constant denominators, but the differences here a in the ten-thousandths place....
Here are some previouisly familiar wpe’s recast into Wins
per Run models, though most don't use Q or G, just the final result M that is a
proxy for them.
DTE: w = 1/2 + (y - x) / (2y) Here, M = 1/(2y) Note that 1/(2y) for an average team in baseball over the years will indeed be in the vicinity of 1/1500. Also note that his is not how DTE was written earlier, but simple algebra shows the equivalence.
Kross2: w = 1/2 + (y - x) / (2x) Note that M = 1/(2x) simply switches from its DTE value of 1/2y to 1/(2x), same rough size for an average team as for DTE. Again, this is a new form of Kross2, but simple algebra reveals the equivalence. It is no wonder Kross came up with these two parts of his piecewise function, DTE and Kross2, since they are mirror images--one using x, the other y, in the denominator. Since y > x, one will always be an overestimate compared to the other, and naturally using an average of the two will give a middle value, less likely to be either an underestimate or overestimate. Which is precisely what the next function uses:
BVL2: w = 1/2 + (y - x)/(y + x) This time we have M = 1/(y + x), where the denominator is simply the average of 2x and 2y, the denominators of DTE and Kross2. BVL2 is thus naturally more accurate than either of the first two individually, though maybe not of their piecewise totality in Kross. And again, 1/(x + y) will be in the historical average neigborhood of 1/1500. I note that Patriot attributes BVL2 online to David Smyth--I don't know who had priority, but it's a great little wpe (and wpr.) However, it is a bit odd that BVL2, which fine-tunes the individual team run-contexts, is a bit less accurate than just using M = 1/1500...but maybe my sample isn't big enough...or maybe the game run-context isn't so important!
Are all wpr's "Linear"? Well, sort of, but not really. They equal a constant plus (y - x)M, which is apparently "linear" in the variable s = (y - x). But this is not true if M is not a constant, but rather a function of x and y--then the result is certainly not "linear" in x and/or y, and not in (y - x) either, since that is not a parameter independent of x or y.
But this is a good thing, because it turn out that
Pyth2, which is NOT linear in x, y,
q, u, t, or v, is nonetheless also a
Pyth2: w = 1/2
+ (y - x) (y + x) / [ 2(y^2 + x^2) ] Again,
a new form, equivalent by algebra to the old one.
Here, M = (y + x) / [ 2(y^2 + x^2) ]. This also brings up a minor quibble I have
with Patriot's discussion online, because he says that "linear"
wpr's, of which he lists DTE, Kross2, and BVL2, don't meet constraints on w, namely
that w should be between 0 and 1. But
Pyth2 does meet this constraint, and
is a "linear" wpr in his sense just as much as the others (though
again I would not call any of them linear.)
The point is that M(x,y) can sometimes take a form that does allow the
overall w to always be between 0 and 1--it may not, but it can and sometimes
Is M here roughly in the
neighborhood it should be? Yes, set x =
y, and M = 2y/(4y^2) = 1/2y, so on average, it's around 1/1500 or so.
BJMWP: w = 1/2 + (y - x) / (2A) Yet
another familiar wpe is a wpr. M = 1/(2A). Obviously, 2A for the league average team
runs A each year is a better choice than either 2y or 2x in the denominator. M
has same rough value using 2A as 2x or 2y, and again is roughly 1/1500.
Ican't tell how accurate this is since I don't have A data, but
using A= 760 as a constant, I get rmse = .0263, pretty accurate.and
fine-tuning A's should make it even better. This is thus probably
at least as accurate as W1500.
TT (Tango Tiger):
w = 1/2 + (y - x) (2) / [x + y + 10G] Rmse = .0256
Here, M = (2) / [x + y + 10G] While this is a bit
different, it can be obtained either from a fairly ad hoc Runs/Win formula…
"10G": w = 1/2 + (y - x) /
(10G) ?? rmse = .0265 [I get slightly better accuracy with 1/(9.7G): rmse = .0264]
"10G": w = 1/2 + (y - x) / (10G) ?? rmse = .0265 [I get slightly better accuracy with 1/(9.7G): rmse = .0264]
BVLN: w = 1/2 + (y - x) N/2/(y + x) OR: Best Fit: (N = 1.82): w = 1/2 + 0.91 (y - x) /(y + x) rmse = .0264
Here, M = 0.91/(y + x) , a "best fit" to the data, and
the same as the best fit for PythN exponent, which also is 1.82. Nice accuracy!
Recall that for any N, both BVLN and PythN have the same tangent
line: namely, DTEN! However, this is partially true merely by
definition: we define DTEN as "the tangent line of PythN"! This is in their q = y/x forms, evaluated at
q = 1. But then we note that by that
definition, DTEN has exactly the same structure as DTE, but with a different
constant N/4 instead of 1/2. And that it
then turns out to be the tangent line of BVLN justifies the
"definition" even more, since DTE (using N = 2) is the tangent line
But right now
we're dealing with (y - x) forms, not q-forms.
So let's go to:
DTEN: w = 1/2 + (y - x) N/ (4y) OR: Best Fit: (N = 1.82): w = 1/2 + (y - x) /(2.2 y) XXX RMSE???
Here, M = 1/(2.2y) , for the best fit.
PythN: w = 1/2 + (y^N - x^N)/ [2 (y^N + x^N) ] Oh, darn… Fixx...UP use 1.0001y instead of y in the two factors that could = 0., and maybe even in the other denom. factor
Well, this is NOT a wpr--it is difficult to get the
numerator, y^N - x^N to contain a factor of (y - x). The factoring out of (y - x) works nicely for N = 2, as we saw above in Pyth2, or for any other whole
number N (none of which are accurate), but NOT nicely for N = any NON-integer
fraction or decimal, which is the general case for PythN.
However, it is an IMPLICIT wpr, as y^N - x^N DOES "factor" as (y - x) (…etc…), but the problem is that the remaining factor (…etc…) is an infinite series when N is not a whole number. The series obviously converges (except when y = x), since Pyth N gives (in its normal form) perfectly good results, but the series will have no simple form for M, and won't converge for quite a while (i.e., will need lots of terms for accuracy.) So this does not yield any "formula" that is worth working with in a wpr form. Of course, PythN is still a great wpe in its other, simpler form, w = y^N/(y^N + x^N). And the form above shows that it can be formulated as 1/2 + something; but it can't be represented simply as a wpr--as a function of surplus runs, (y - x).
Xxxx why not use M = 2/(x + y+ 2A), which is false average of 1/ x + y and 1/ 2A
Separate issue: false average of 1/x + y and 1/1500, IS pretty good! but with 1625, not 1500 -- BEST??! Except for BVL a=br
10. Pythagoras to Natural Logarithms via Integration: a New Run Estimator
10A) Let Pyth(n) be w = y^n / (y^n + x^n). Let G = total Games played, and n be fixed at some power we find "best".
To formulate a Wins/Run function, we fix x, assume y = x has created w = .500, i.e., wG = (1/2)G, and then
solve the equation wG = (1/2) G + 1 for y, and then for y - x.
Doing so, we get y^n / (y^n + x^n) = (G/2 + 1) / G, or , cross multiplying and then collecting and factoring y^n,
y^n (1 - [G + 2]/[2G] ) = x^n (G + 2)/(2G), or, dividing, simplifying, and taking the nth root,
y = x ( [G+ 2]/[G - 2] ) ^ (1/n).
Therefore, y - x = [ ( [G+ 2]/[G - 2] ) ^ (1/n) - 1 ] x = (extra) runs / additional win .
Taking the reciprocal gives (extra) Wins/Run = 1 / (Kx), where K = ( [G+ 2]/[G - 2] ) ^ (1/n) - 1.
10B) Now assume a team has scored y runs and given up x runs. We ask how it accumulated extra wins during the process of accumulating the excess (y - x) surplus runs, since the Wins/Run changed at each step of the way.
That is, our assumption above was that y = x, when we found Wins/Run, but as each successive extra run was scored, there is a higher y, and hence an "assumed" higher x, which will influence the extent to which the next incremental win will come from the (new) level of x and y..
So what we really need to do is INTEGRATE the Wins/Run function at every level of runs, from the x-value that really was what the opponents scored, up to the y that our team actually scored, accumulating "dw"'s (win increments) along the way for the varying Wins/Runs occurring at each different y.
To do this, since y and x are now fixed total runs scored, and can't play the role of a variable,
Integrate d[Wins/Run ] = Integral of [ 1 / (Kp) ] dp from p = "x" to p = "y".
Since integral of 1/p = ln(p), and substituting the limits of integration x and y,
We get: total "extra" Wins = (1/K) [ ln y - ln x ], with K as above. Divide these wins by G to convert to winning percentage, and
add this "extra w" to the w = .500 level when y started (at x), and we get a new run estimator:
LNW: w = (1/2) + (1/KG) [ ln y - ln x ] ,
where KG, for n = 1.82 (best Pyth n) and G around 160, yields 2.2 or so.
Best LNW: w = (1/2) + [ ln y - ln x ] / 2.2 Rmse = .0259
[Note: 4 / (KG) = the Pythagorean exponent n we started with. This is not surprising when we look at tangent lines, as below.]
This is quite an accurate estimator, according to my sample data.
w = (1/2) + [ ln u ] / 2.2 = (1/2) - [ ln q ] / 2.2
11.L'Hopital Enters the Fray
= z = G(y^N - x^N) / [2 (y - x) (y^N + x^N) ]
Or, even simpler, we can once again use the "false average approach" on the fractional parts of the two fractions that mis-estimate (one overestimating, the other underestimating). This is what we did in part
developed in the late 70's, vaguely mentioned in his 1978 Baseball Abstract,
and explicitly developed in his 1981 Abstract, BJ's (apparently misnamed, as it
seems to have nothing to do with logarithms) Log5 formula is an extremely
useful formula, stemming from basic probablility theory, describing how often
Team Y could be predicted to beat Team X in a series of head-to-head games, if
the only the info we have about them is their Winning Percentages (or,
alternately W/L odds ratios) against their respective entire leagues of
That is to
say, if Team Y meets Team X in the World Series, and we know that Team Y went
60-40 in the regular season in its league [winning percentage = .600 = 60/(60 + 40) = Wins/(Wins +
Losses)], while Team X went 55-45 (winning percentage = .550) , what is the winning percentage w that
we should "expect" in the World Series for Team Y as it plays Team
Define a team Y's Quality Qy as Odds
Ratio Wy/Ly, where
Substituting in the Basic Axiom, we get
w = Qy / (Qy + Qx) =
Odds Ratio for Y / (Odds Ratio
for Y +
Odd Ratio for X)
5 : w
= Wy/Ly /
(Wy/Ly + Wx/Lx )
[Odds Ratio Version]
BJ went through a method similar to mine to arrive at his conclusions back in 1981. A discussion of this follows, but can be skipped by those who aren’t interested in his method.
many different plausible models that lead to the same log5 result, which is the
result BJ gave in his 1981 abstract. But
his stated method there is almost identical to my "Quality" model
method in spirit. So in a very real
sense, Log5 is the original/quintessential "quality" model for all
winning percentage estimators. However,
BJ did not seemed to explicitly realize that the basic quality model can
underpin much more than log5.
BJ said the
following in the 1981 Abstract: Assume
that Y as above has winning percentage j = .600 against its league opponents Z,
where Z is "all the opposing teams that Y played against, as instantiated
in whomever they chose to play in their games against Y only." Again, Z is conceptualized as a vast
"entire league" team Z that only plays certain of its players against
Y in any given game.
assume that an average team (roughly like Z here) has an arbitrary
"quality" level Qz of 1/2.
Then ask, "What quality level Qy would Y have to have, in relation
to Z's quality levelof 1/2, so that Y would win with a winning percentage j =
.600 against Z, the league average team, under
the assumption that in such a season series of games, Y and Z's wins are in
proportion to the ratio of their respective Qualities?
That is, he essentially asked, what quality level Qy would satisfy the Basic Axiom: Qy / (Qy + Qz) = .600,
where Qz = 1/2 = .500 ? [He asked this in English, and actually garbled (grammatically) the question, so it is technically inaccurate--but it is clear from the immediately subsequent math what he meant.]
If you substitute and solve Qy / (Qy + .500) = .600, you get by basic algebra Qy = .5(.6/.4) .
James called this value the
"log5" of team Y. Note, however, that if he had
arbitrarily assigned a "quality level" of 1 (instead of 1/2) to team
X (the "league" opposition), he would have gotten log5 equals
1(.6/.4). Thus it is clear that it is
the .6/.4 that is the crucial result--he
has it multiplied by .5 only because he assigned an "average" league
team an arbitrary quality level of .5 In
my calculations, I always assign an "average" team an arbitrary quality measure of "1", if
possible, because it simplifies the math.
I do that in what follows.
So, for me,
the "quality" of Y would in this case be simply Qy = .6/.4 = 1.5, compared
to an average team's stipulated quality
level of 1. The fact that for James his
"log5" was half of that was unimportant, and the extra
"1/2"s quickly canceled out of log5 as used by others in the
future. Still, it is clear from his
language that he explicitly conceptualized "log5" as a measure of
"talent" (that is, quality), relative to an average team talent
13. List of all wpe’s considered
1) Pyth2 (BJ): w = y^2 / (y^2 + x^2)