Mathematical Relationships Between Winning Percentage Estimators (WPE’s) in Baseball 

                        (This webpage is under construction, and is not intended yet to be "live". ) 

In Sabermetrics, the mathematical study of baseball, there are many ways to estimate what a team’s winning percentage (w) should be against its opponents, given the total runs (y) it has scored and the total runs (x) it has allowed in a certain number of games (G). 

Bill James’s Pythagorean Formula w = y^2/(y^2 + x^2), which I call "Pyth2", was the most famous early example--simple and pretty accurate; but many other wpe's have been found, though they only improve average accuracy by a small amount.  My purpose here is to analyze mathematical relationships between them, and to show that they quite often have a common model and structure, which is related to that of Bill James's Log5 Formula, though Log5 uses different input (not runs) to estimate the same w.   My efforts were first motivated by trying to understand, over many years, why Pyth2 is actually true, and from what basic principles it can be derived.  I finally figured that out, and it led to the more general model.  

What Log5, Pyth2,  and most other wpe's have in common is what I call the "Quality Model", which expresses the mathematical formulas in terms of a very simple model based on plausible measures of a baseball team's "quality", i.e., on quantitative statistical measures of the degree to which its team members' skills and abilities are better (for baseball purposes) than another team's.  These are most plausibly expressed as functions of the runs it scores and gives up.  This analysis also leads to some new wpe's I have come up with, but my work merely reshapes and extends that of others, who deserve credit for very insightful original analysis--see the "Credit" section at the end.  I am a math professor and baseball fan, but get interested in technical sabermetrics only sporadically, don’t follow the field regularly, and hence may well be unaware of others’ results.  If I haven’t properly credited someone, please let me know and I will remedy that. People who will find my work interesting may be few and far between, but those who aren't into the math details may still appreciate some of the overall relationships between seemingly unrelated baseball prediction formulas, and the "quality" concepts on which they are based.

Notation 

 A winning percentage estimator (wpe) for a team is a function w = f(x,y) that predicts its winning percentage w against its opponents, given the total runs y scored by the team in G total games, and the total runs x allowed by the team's pitchers in those G games--with x also of course being the opponents' runs scored in the games.   

 It is often helpful to write w as a function of a single parameter based on x and y, especially when this gives the resulting wpe's a simple or natural mathematical form that lends itself easily to analysis.  The four most useful parameters are:
    u = y/x, the "run ratio"    and   q = x/y  =  1/u  , with the latter leading to slightly simpler formulas in many cases.
    t = y/(y + x), the ratio of a team's runs scored to the total runs scored and allowed 
    v = (y-x)/(y + x), the ratio of a team's surplus runs (y - x) to the total runs scored and allowed.      

It is also helpful to write v as s/r , where s = (y - x) is the surplus runs (of the team as compared to its opponents, with s being negative if it is outscored by its opponents), and r = (y + x) total runs scored by both teams together.  

Thus, often I will write a wpe as w = f(u) or g(t) or h(v), etc.  These four parameters, u, q, t, and v,  are all "equivalent", in that they map onto each other one-to-one.  That is, if one knows any of the four parameters, it uniquely determines each of the other three.  So any wpe using one parameter yields an equivalent wpe using any of the others, though with a different formula, of course.  

This is not true of r or s--neither uniquely determines the other, nor any of u, q, t, or v.  And vice versa--knowing u, q, t, or v does not determine r or s., though it does determine their ratio.

Once one has a formula for a potential Winning Percentage Estimator (wpe), one wonders how accurately it applies to real teams in Major League Baseball.  I measure wpe’s against a roughly 1260 team-season sample of MLB from 1903 to 2010, with an overall w of .50000 in the sample.  My sample uses only team-seasons with 149 games or more, which improves the accuracy of all wpe's over what they would be if used on shorter seasons.  I use Excel to measure the root mean square error (rmse) in the predicted winning percentages as compared to actual results—but I don’t then convert to “Wins”, as many analysts do.  Since my sample isn’t all MLB team-seasons, and leaves out all shorter ones, I can’t tell which wpe's are “really” most accurate—but the general relationships I find agree pretty much with accuracy info about wpe's I've found online from other sources who do use all data.  I just want (and use here) a rough estimate of which measures (including my newfound ones) are substantially different from others.  As a benchmark, BJ’s Pyth2 has for my sample a rmse of .0264 (the difference between a w of .500 and a w of .5264, or around 4 wins in a 162 or 154-game season), and the lowest rmse's for other formulas are a couple at .0256, and one at .0255.  The “worst” (still halfway-decent, and often quite accurate for most "normal" teams) ) wpe's I consider can get up to rmse of about .031.  The .026 or .025 rmse level is almost certainly the best possible, since there is an ineradicable element of random chance in baseball.  And, of course, these differences in accuracy have little pracical importance, since they represent a tiny fraction of an extra predicted "Win" on average, for normal teams. How well various wpe's predict extremely good or bad teams' records is a different issue, which I don't address much here.

More Notation:  y = runs scored by your team        x = runs scored against your team, by its opponents, i.e., runs allowed by your pitchers
                        G = games        w = winning percentage  =  Wins / (Wins + Losses)  (Not necessarily W/G, because of occasional ties)                      
(Note: occasionally I will use capital W for Wins and L for Losses.  But winning percentage uses small w—however, I rarely have “Wins” or "Losses'" in my formulas., so this shouldn’t be confusing.).   
_____________________________________

Contents:   Note that the penultimate section is to be a concise list of all the various wpe’s I consider. 

Boldface Contents are almost complete.  Others in various stages of construction.

1.  Quality, Wins, Runs, and Chance:  
      A.  The Quality Model
      B.  Bill James's Log5 formula  
2.  A derivation of Bill James's Pythagorean Formula (for power 2) from the log5 formula.
3.  Bill James's "Win Shares" Marginal Runs WPE
4.  Ben Vollmayr-Lee's  Linear Prediction Model
5.  General characteristics of Power Run Formulas, including the Pythagorean.

6.  The Uniform Run Distribution:  DTE to Quasi-Pythagorean Results
7.  DTE, Kross, and Tangent Lines to Pythagorean Formulas:  Simpler derivations
8.  Tangent Line Considerations for log5 
9.   Wins Per Run  Models
10.  Pythagoras to Natural Logarithms via Integration:  a New Run Estimator
11.  L'Hopital Enters the Fray
12.  BJ’s original derivation of log5 in 1981
13.  List of all wpe’s considered
14.  Credits to other sabermetricians and their research     
_____________________________________

1.  Quality, Wins, Runs, and Chance  

1A.  The Quality Model

Some baseball teams are better than others.  This difference in quality arises from the batting and pitching and defensive skills of their players.  These skills lead to runs scored and allowed, and those lead to Wins and Losses.  

We will define a “measure of quality” Qy for a team Y as an appropriately chosen, analytically plausible, statistical result of its play, involving bases, outs, runs, Wins, or other measures.   [But we will assume it involves runs, unless otherwise stated.]

For example, perhaps the most obvious candidate for such a measure of quality is u = y/x, team Y’s “runs ratio” of runs scored to allowed.   

Is it true that when two baseball teams play each other over a long series of games, they will tend to win in proportion to their respective quality measures Qy and Qx? 

That is, if one team has k times as much quality as another, will it tend to win k times as many games?  The answer is (roughly) yes! 

This holds up for many different plausible measures of quality, including u = y/x.  

However, note that this is not true for all sports!  It is only true in baseball because there is just enough random chance involved in the game to make it true.  This random chance comes from many factors, but one major one is that a baseball team's "quality" is a (weighted) average of that of all its players--but not all players play in any given game.  This is especially true of pitchers--the team's 5th best pitcher may be pretty bad, even though the overall average quality of the team may be better than any opponent's.  When the 5th best pitcher pitches, the team has a good chance of losing.  Or, when their best pitcher pitches, but happens to pitch by chance against the only good (but very good!) pitcher the opposing team has, the team may still lose. 

But in arm-wrestling, there is little role of chance:  if I am twice as strong as my opponent, i.e., if my “quality” of arm-strength enables me to lift 200 pounds while she can only lift 100, I will NOT win twice as many of my matches with her as she will—instead, I will always win, practically speaking, for an infinite ratio of my wins to her wins.  This fact could be altered if a greater element of chance were artificially introduced into arm-wrestling--say, if a muscle relaxant were randomly administered to one contestant before each match.  Then whoever got the relaxant might often lose, no matter how weak their opponent.  But barring that sort of thing, there is not enough chance in arm-wrestling to result in contestants winning in proportion to their quality. 

And in basketball, the Bulls team that went 72-10 did not win in proportion to its relative excess of quality over its opponents.  It was nowhere near “7 times as good” as its average opponents—not by any measure (shooting percentage, speed, strength, points, rebounds, steals, free-throws, etc.), nor even by the total sum of its (small) excesses in each category.  But, as in arm-wrestling, there is less “chance” in basketball than in baseball, so even a modest surplus of basketball talent over one’s opponents much more often allows the team to demonstrate that surplus by winning the game.  This is because (basically) the "good" players on the team always play in every game, and especially near the end, when close games are decided--Michael Jordan was rarely "by chance" not around to help determine a game's outcome when needed. 

However, the fact that this proportionality of wins to quality IS roughly true in baseball, under many different plausible measures of quality, leads to a basic model for many different sabermetric formulas, which may at first seem ad hoc and unrelated.  

Note that “quality” is inherently a relative notion.  If one team has quality 3 while another has quality 6, it should make no difference if instead we measure the first team’s quality as 10 and the second one’s as 20.   But there is an obvious way to “normalize” the measures:  make sure that the “quality” of an average team (which is predicted to win half its games, with w = .500, and with x = y) is equal to a constant:  and 1 is certainly the best constant.  

I will do this normalization for some quality measures, particularly ones based on the run ratio y/x:  if y = x, then Qy = 1 = Qx.   However, since this "normalization" is not at all necessary to make the model work, and is merely an "aesthetic" feature, I will not always do it.  One can always force any quality measure into a different, normalized form that yields the same wpe via the Axiom, but when the logical features and rationales of relative quality measures are already apparent, even though it isn't "normalized" to 1, I will often not bother to do so, since doing so can introduce extra mathematical cumbersomeness, and never changes the final Quality Model results of predicted "w".

***************************************************************************

BASIC AXIOM of the Quality Model: 

A baseball Team Y's winning percentage w against Team X is well-predicted by the following model: 

For a given (plausible) measure of quality, with team Y having quality called "Qy", and its opponent team X having quality called "Qx",

w  =   Quality of Y / [Quality of Y + Quality of X ]  =  Qy / (Qy + Qx)   

The Basic Axiom simply says that teams are predicted to (and will, roughly, actually) win in proportion to their measured "Qualities") 

***************************************************************************

In baseball, there are many quality measures for which, when team Y has “k” times the quality of its opponent according to that quantitative measure, it will indeed generally win “k” times as many games in their matches.  That is, baseball (roughly) obeys the Basic Axiom, and a plethora of roughly equally accurate wpe’s demonstrate this in their common mathematical structure. 

For various mathematical reasons, it may sometimes appear that implausible Quality measures give an accurate wpe--in which case the goal is to find a different, plausible quality indicator that yields the same formula.

     For Example:  Bill James's (BJ's) Pythagorean Formula with N = 2  says that the winning percentage 

     w   for a team Y in a league is:     Pyth2:             w  =  y^2 / (y^2 + x^2) 

It is certainly a pretty good predictor, as wpe's go.  Here y is runs scored by Y, and x is runs allowed by Y, i.e.,  runs scored by its opponents (X).  This result would trivially follow from the Basic Axiom IF we chose runs^2 to be the the quality measure of each team, respectively.  That results in y^2 being the quality Qy of Team Y, and x^2 being Qx, the quality of the agglomerated league opponent Team X when playing against Y.  

But that choice is NOT intuitively plausible--I wondered for 28 years WHY the runs-squared were used!  Why aren't just runs themselves (without the squares) indicators of team quality, and hence predictors of winning percentage?  What do the squares have to do with it?

One answer would be that if Qy = y and Qx = x, the resulting formula via my Basic Axiom would be  w = y / (y + x), which has been shown to NOT predict real-life team winning percentages very well.  Pyth2 predicts w pretty accurately--refinements and alternate formulas never do very much better than Pyth2.  But, in fact, there is a much more plausible indicator of team Quality than y^2 and x^2, one which leads (via my Axiom) to the Pyth2 result, with the squares.  The squares are not a natural way in which to measure the "quality" of a team—they simply result from the way in which the Axiom applies to the “real” (more plausible) quality measure.  We will see this below.  

1B.   Bill James Log5 Formula  

First developed in the late 70's, vaguely mentioned in his 1978 Baseball Abstract, and explicitly developed in his 1981 Baseball Abstract, BJ's Log5 formula (apparently misnamed, as it seems to have nothing to do with logarithms) is the quintessential and original Quality model—he derived it with this concept explicitly in mind.  He set an average team’s quality equal to ½, instead of 1, as I do, but otherwise his approach is the same.  In this sense it is the forerunner and exemplar of most runs-based wpe’s—especially of Pythagorean results.  (This is true even though it predicts head-to-head w from league W/L ratios, NOT from runs—but the structure is the same.)  

BJ tried in his 1981 Abstract to “prove” this relationship between log5 and Pythagorean results, but unsuccessfully—his claims in the 1981 Abstract about how the two are related have a fundamental error.  But the formulas are nonetheless very related, as I will show soon.  [A discussion of BJ’s original development (with the error) in the 1981 Abstract is presented further down the webpage.] 

Log5 is an extremely useful formula describing how often Team Y could be predicted to beat Team X in a series of head-to-head games, if the only the info we have about them is their Wins/Losses odds ratios against their respective entire leagues.   

That is to say, if Team Y meets Team X in post-season play, and we know that Team Y went 60-40 in the regular season in its league, while Team X went 55-45 in its league (perhaps the same league, perhaps not), what is the winning percentage  w  that we should "expect" in the postseason series for Team Y as it plays Team X? 

Define a team Y's Quality Qy as its Odds Ratio  Wy/Ly, where both Wins Wy and Losses Ly were compiled against their entire league.  When it plays a different Team X in the postseason, X’s quality Qx  =  Wx/Lx, similarly defined in play against its league..  (Though Y and X may have played against different leagues, we assume for simplicity that each league was of the same overall average quality.)  

This is a very plausible definition of quality—the better a team, the better should be its odds of winning against the entire league overall.  In fact, when two teams do play each other, their actual W/L odds ratio in that series, after they finish playing each other, is pretty much our best definition of their relative quality: if, over a lengthy series, Y won twice as many games against X as it lost, we would intuitively say: “Y is twice as good as X”.   Again, this is true in baseball, but not in arm-wrestling or basketball.  However, such "intuitive" reasoning runs the risk of being circular:  defining quality by win ratios is begging the question.  But the potential circularity of the reasoning is avoided once we see that other intuitively plausible quality measures (particularly involving runs, not wins) also lead to similarly good results using the Quality Model. 

       Substituting in the Basic Axiom, we get: 

w  =  Qy / (Qy + Qx)    =   Odds Ratio for Y /  (Odds Ratio for Y  +  Odd Ratio for X) 

Or:    Log 5  (Odds Ratio Version) :    w  =    Wy/Ly   /   (Wy/Ly  +  Wx/Lx ) 

So, when a 60-40 team plays a 55-45 team, it should have predicted winning percentage w of:   w  =  (60/40)  /  [60/40 + 55/45 ]  =  .551 . 

This discovery by BJ has long been shown to be a very useful and accurate result.  (It was also immediately transformed mathematically into an equivalent version using inputs of team Y’s and team X’s winning percentages against their leagues, rather than their odds ratios.  But the odds ratio version above is the important one for other wpe’s.)
___________________________________________
 
2.  A derivation of BJ's Pythagorean Run Formula (for power 2) from the Log5 formula                           (Rmse = .0271)

We will again use the Basic Axiom and quality model, as we did for Log5, but this time we will put in a different Quality measure, one involving runs, not wins against league.   

BJ’s Pythagorean Formula for power 2 (Pyth2) says that winning percentage w is given by (with y = runs scored by Y, and x = runs allowed)

    Pyth2:             w  =  y^2 / (y^2 + x^2) , 

Define Qy,  the Quality of Team Y ,  as  Qy =  y/x.  

That is, the quality of Y is measured by the "run ratio" of its runs to its runs allowed.  This is certainly an intuitively plausible measure of the quality of a team. Note that we can consider Y’s opponents to be a union of daily subsets of a vast “average” “league” team X, who only play some of their players against Y in any given game.  That is, Team X is considered as a union of “sub-teams for a day” for each day that some of their players played Y.  

So the Quality of Team X is also defined by its runs ratio against Y:  its runs scored against Y (x) to the runs it gives up (i.e., y, the runs Y scores against X.) 

That is, Qx = x/y 

Plugging the above Qy and Qx into the Basic Axiom, we get   w  =  Qy/ (Qy + Qx)  =  (y/x)  /  [ (y/x) + (x/y) ], or clearing fractions, multiplying by xy in top and bottom, we immediately get Pyth2.   QED.

So this is why the "squares" were in the Pyth2 formula--I was happy to see it after 28 years of wondering!  It is this sense in which Log5 and Pyth2 are structurally the “same” formula, as BJ tried unsuccessfully to show in 1981:  they both stem from plugging in a quality measure to the Basic Axiom, with log5 using W/L, the odds ratio, as the quality measure, while Pyth 2 uses y/x, the run ratio.  Since runs scored tend to increase wins, while runs given up tend to increase losses, the similarity of the results is wholly natural and plausible—but note that the “squares” aren’t in the measure of the quality; rather, they come from the mathematical structure of the Axiom, and aren’t “ad hoc” (as they seemed to be for BJ in 1981, though it's hard to be sure.)

I get Rmse for Pyth2 as .0271, which will be the “pretty good” benchmark against which to measure other wpe’s. 


2A)  BJ's Pythagorean Formula (for power n):  the role of Chance in baseball.
 

It was rapidly realized, even by BJ in 1981, that Pyth2 could be made more accurate (a little bit) by using a different power n, instead of n = 2—he claimed that n = 1.83 was best, and that has held up as roughly correct since then, though it differs slightly depending on the data one tests against.  I personally get the best fit at around 1.85, but my data is only a sample, so I'm going to stick with 1.82.

I’ll use the “best” exponent as 1.82 .    And I use "N" in the title of the formula, but "n" in the calculations--but they are the same.

    PythN:   w  =  y^n / (y^n + x^n)          [ Best fit to real data:  n  =  1.82 ]                       (Rmse = .0258) 

We derive this from the Quality model by using:  

Qy = (y/x)^(n/2),  and  Qx = ((x/y)^(n/2)  , and substituting into the Basic Axiom.  
Clearing fractions by multiplying by x^(n/2) y ^ (n/2) yields the Pyth(n) result immediately. 

Conceptual Problem:  A Digression into "Effective Quality". 

But now we ask, why is it plausible that Quality would be proportional to (y/x) raised to a weird decimal power (n/2), rather than proportional to the intuitively plausible y/x ? 

Well, it's not really plausible--baseball does NOT in fact quite follow the quality model.  But it does follow it approximately, and closely enough that the model still shows why many wpe's are structurally related.  

Note that for the “best” results, n/2 is about 0.91, which is to say that it makes (y/x) a tiny bit smaller when y>x, and thus makes (x/y) a bit larger in that situation.  That is, it moves each run ratio back towards 1 by a tad, and hence moves the predicted w back towards .500 by a tad.  BJ in 1981 grasped the import of this fact--historically, MLB baseball has had just a little more chance in it than the amount required to produce a "perfect" proportion between quality and winning.  As we saw for arm-wrestling, introducing an extra element of chance into the game alters the degree to which "real" quality differences will manifest themselves in winning percentages.  Thus, the more chance involved, the closer the resulting winning percentages are to .500, no matter what the real quality differences between the contestants.  

One might also phrase this by saying that a small "extra" element of chance combines with a team's "real" quality to create its “effective” quality, and that it then wins in proportion to its effective quality.  But, though this presents a handy way of discussing things, it is essentially circular reasoning--then "effective quality" simply becomes "whatever correlates best with winning", which is NOT how we want to envision a model that uses "real" quality and the Basic Axiom.

Note, crucially, though, that how many total runs (or points) are scored in an average game (RPG) affects the degree to which “chance” influences winning..  When lots of runs are scored, there is more opportunity for the real “quality” (which is what produces runs) to manifest itself, and less opportunity for chance to make an actual difference in the outcome of the game.  And vice versa.  

This fact is well known in sabermetrics, and leads to wpe’s that include RPG =  (y + x)/G in them, by which the greater the RPG, the more the resulting w is due to a magnified effect of the "real" quality ratio upon the winning percentages.  Again, this is true in basketball, which has a couple hundred points scored per game, and where w = (y/x) raised to a large power.  We will look at wpe’s involving RPG below—but note that while the MLB RPG has varied significantly in different eras, it has historically been in the 8 to 10-ish range, which is sufficiently low that apparently chance plays a significant enough role to make the ideal exponent in Pyth(n) a bit less than n = 2.  

Again, this effect of chance at historically average MLB RPG levels means that the Basic Axiom isn’t “really” true—rather, it’s true for “effective” quality, as opposed to “real” quality--but it’s close enough for horseshoes, and for MLB baseball.  Again, the difference in accuracy between Pyth2 and Pyth1.82 is very small, on average—a small fraction of a win per team per year.  For almost all purposes of measuring MLB w’s, the Basic Axiom is roughly true  And this is shown by the fact that almost all wpe’s (of varying accuracy) can be shown to fit it quite plausibly.  

What it does mean is that many Quality measures result in wpe’s that can be slightly improved in predictive value by using some method of “moving them back to .500” just a tad—again, as realized by BJ in 1981. That is, by various means, one tries to establish the "effective" quality of a team, including an effect of chance.  This led him to what I call:

2B)  PythToAHalf
  (BJ)     for some constant K:  

w  =  [  y^2  + K ] /  [ y^2 + x^2  + 2K ]    Best fit:  K = 38000                (Rmse = .0255)   checked

Adding any constant in the numerator and twice the constant in the denominator moves the overall result a bit closer to ½.  BJ used K = 60,000, but I find with my data that the best fit is around 38,000.  Doesn’t matter too much… 

This has the same effect on overall accuracy as changing the 2 to 1.82 and not using any K. 
Both Pyth1.82 and PythToAHalf do a little better than Pyth2, lowering Rmse from .0264 to .0258 or .0255, respectively..

 This result comes from the Basic Axiom and the following ("Effective") Quality Measures:

               Qy  =  (y/x) + K/(xy)     ,  with   Qx  = (x/y) + K/(xy)    

Intuitively, this plausibly says that a team Y’s “effective” quality is its run ratio plus an  “extra chance” factor, K/(xy), which is the same for both Y and its opponents X.  The larger the total runs per game, the larger will be the denominator xy in the chance factor, and thus the smaller will the extra chance factor be. 

Plugging those Qy and Qx into the Basic Axiom and clearing fractions, multiplying top and bottom by xy, we get PythToAHalf immediately.  QED. 

Note that this gives us a (crude) way to measure the "extra" chance in baseball, beyond that which would enable teams to win in proportion to their "real" quality:  at, say x = y = 750 per season, and K fitting best at about 49000, we get an average chance factor of around .067, or let's say 7% extra chance beyond what would produce a winning percentage in proportion to quality.  Of course this changes when x and y are lower or higher, and 49000 is probably not be the real best fit.  

BUT, from now on, we will NOT measure or model "effective" quality any more, as, again, the real "quality"measures are still good predictors, and it is the mathematical relationships between the various fundamental wpe's in which I'm interested, not ad hoc tweaks that produce minutely better accuracy.
_________________________________

3.  Bill James's "Win Shares" Marginal Runs WPE
     See page 333, "The New Bill James Historical Baseball Abstract".  He has since modified the method slightly, which is unimportant for the structural relationships.

Define A = Average Total Runs Scored per team for a league in a given year.    

3A)    Bill James  Marginal Winning Percentage:        (BJMWP)   

               w  =  [y - (1/2)A  +  (3/2)A - x ]  /  [2A].                    (Rmse = .0272 or better)

 This simplifies to :     (BJMWP)          w =  (y - x + A) / (2A}.      It also simplifies to:  w = 1/2 +  (y - x)/(2A).   

Note:  s = (y - x) is often called Run Differential in the saberrmetric literature, though I prefer to call it the surplus. 
There is an important class of wpe’s based on functions of (y – x), called Wins Per Runs models, which I address below in section 9.
Thus, this form is our first encounter with formulas based on the Run Differential.
But it will also fit the Quality Model--as do all such Run Differential formulas (see section 9.).    

Derivation of the formula BJMWP from Quality Measures: 

Let the Quality of a team Y be:   Qy =  [ (y - x) + A]  / A        Similarly,  Qx =  [ (x - y) + A ] / A 

Intuitively, this says that any team Y, no matter how many total runs it scores, that has a surplus of runs scored over allowed of (y - x) has Quality, relative to that of its opponents, in roughly the same ratio that a near-average team with the same amount of surplus runs s  would have to its average opponent.  

That is, the "near-average" Team would normally have A runs, but its excess run differential would now give it (y - x) + A runs.  An average opponent would have A runs. 

This is fairly plausible--in a league where average total runs per team are 700, a team that scores 760 and gives up 740 should be of roughly similar overall quality to one which scores 720 and gives up 700, as should a team that scores 669 and gives up 649. In BJ's formula, all are assigned the quality of a team with 720 and 700 runs, for simplicity. 

[One also gets the same result by assuming that Qy = A / (A + (x – y), i.e., that the quality is similar to a situation in which Y scored an Average amount of runs A, and X scores A + (x – y), i.e., below average, if y > x .]  

Putting the Quality measures above into the Basic Axiom, we cancel the common denominator of A that is in all terms, and get:    

w  =  (y - x + A)   /  [ (y - x + A) + (x - y + A) ].  
The denominator has the x's and y's add up to zero,  leaving:       w =  (y - x + A) / (2A)  .       QED.                         

I don’t actually have data on A handy, so I checked it against various constants in place of A, and for A = 750 got “best fit” rmse of .0272.  Thus, it is at least as good as Pyth2, which is explicitly why BJ used it for Win Shares.  But of course it is undoubtedly more accurate using A for each league/year separately.

3B)  I decided to modify BJMWP as follows, to get a new wpe:  BJMWP (Symmetric) 

Assume that a difference of runs (y – x) is similar to a situation where each team is a distance of half of that difference above or below an Average Team.  That is, assume that it is as if Y scored A + (1/2)(y – x) runs, and X scored A + (1/2) (x – y) runs.  

Then apply the same Quality measure as in the Pyth2 case:, i.e. the run ratio between the two teams:  

Qy  =  [ A + (1/2) (y – x) ] /  [ A + (1/2) (x - y) ]  ,         Qx = [ A + (1/2) (x - y) ]  /   [ A + (1/2) (y – x) ] 

This, of course, leads to the Pyth2 result with the runs for each team squared:  (first, double each runs amount for simplicity):     

         w  =  (2A + y – x)^2  / [ (2A + y – x)^2  +  (2A + x - y )^2  ]  or, squaring and simplifying, 
         w  =   [ 4A^2 + 4A(y – x) + (y – x)^2  ] / [ 8A^2 + 2(y – x)^2 ] , which equals, eventually,

 BJMWP (Symmetric)      w =  (1/2)  +  (y – x)  2A / [4A^2 + (y – x)^2 ] 

This is another wpe using the run differential  s = (y – x), and will be discussed further in that section.       

Not in spreadsheet.  Nor is the next one.  No RMSE yet. 

3C.  Bill James's 1960’s Winning Percentage Model (3 runs per game per Team)   (BJ1960)       

 BJ1960:    w = (y/G – H) / ( y/G+ x/G – 2H)      for some constant H.   BJ used H = 1.5 .   G = Games   
 
Note:  H = half of  3 runs per game per team.  
I find the "best fit" over my large sample of games and years is H = 1.75.  But that is not very accurate, with Rmse of .0287.  This is to be expected, as the "constant" used can be plausibly expected to vary in different offensive contexts or years.  But the formula is not "primitive":  it is one tweak away from being much better: 

If for H one used not a constant, but   (1/2) ( y + x)/2  / G  ,   i.e.,  one used half of the average single team runs/game in Y’s games, one actually gets Ben Vollmayr-Lee’s Linear Prediction model "BVL2" (see next section.)  A very nice relationship! 

Or, for each different year, one could use for H "half the average League single team’s runs per game".  In the late 1960's, H = 1.5 as used by BJ was fairly close to this, and to the one that leads to Ben Vollmayr-Lee's formula in the next section.
Multiplying BJ1960 by G (Games), and writing A = the average league single team's total runs per season, we get a new wpe: 
 
     BJA:     w =  (y - A/2) / (y + x - A)   Note that it is very reminiscent of BJMWP--a simple fraction involving y, x, A, and 2.  

In fact, BJA has the equivalent form:   BJA:    w = 1/2 +  (y - x)/(2 [y + x - A] )  ,  similar to BJMWP in its version:       w = 1/2 +  (y - x)/(2A). 

However, I can't measure the rmse of
 BJA via my database, since I don't have in it "A" values.  So I don't know (and doubt quite a bit) if this is actually more accurate then BJMWP, though it should be more accurate than BJ1960, one assumes.  When I tested it using constants for A values, as I did for BJMWP, it was a little less accurate than BJ1960 at the best fit value for A (rmse = .0291).  But that best fit was for A around 540 or so, which is very different from the 750 best fit for BJMWP--and doesn't seem too plausible.  

However
, BJA does have the interesting feature that the winning percentage w for a team Y actually equals the Quality measure itself for Y:  

That is,  if Qy  =  (y - A/2) / (y + x - A)
Derivation of BJ1960 from Quality Measures:  The "old" result of BJ1960  follows immediately from the Quality Model's Basic Axiom with Qy  = [ y/G – H   and     Qx  = [ x/G – H ] 
 

This anticipates (by decades!) the very plausible idea of Bill James's later Win Shares notion: 
A team's Quality is its "measure of excess", i.e., the excess of its runs/game over a "minimal" level H of run-scoring competence, with the minimal level set at roughly 1/2 of the average runs/team/game.
 
_________________________________ 

4.  Ben Vollmayr-Lee's  Linear Prediction Model

 See this online source:  

In many ways, the "BVL2" formula coming up below is the most elegant single wpe--it is derivable in many beautiful ways from many other, seemingly unrelated wpe's, and is extremely simple and pretty accurate, especially for a "linear" model.  

Ben Vollmayr-Lee (BVL)  wants to measure predicted winning percentage w from Y's fraction "t" of total team runs (for and against), 

We thus use    t   =   y / (y + x)         ["t" is my notation, not his].  He also (nicely) derives this linear prediction model as a tangent line of the Pythagorean formula with exponent N, about which I will comment in a (later) section. 

For an "average" team, where y = x,  and thus  t  = 1/2, the winning percentage should be .500 = 1/2. 

Thus, a  linear predictor:     w(t)   =  mt + b   should go through (w, t) = (1/2, 1/2).    Here, m = slope, as usual. 

It could thus also be expressed (as BVL prefers) in point-slope form as   w - 1/2  =  m(t - 1/2) , which intuitively relates the change in w from .500 proportionally to the change in fraction t from 1/2, i.e., to the change from the fraction for an average team.  The proportionality constant would be the slope m.  However, I prefer to do the math with the form  w = mt + b, as follows:

BVL notes that a good fit to real life results occurs for  w = mt + b , when  m = some number N (conceptualized as an arbitrary Pythagorean exponent N, about which more later), and  b =  (1 - N)/2  .   These conditions do indeed lead to (1/2, 1/2) being on the prediction line.

   So here is the BVLN  Linear Prediction Model:       w = N t + (1- N)/2          This will be accurate for various N's near 2  


4A)
 We will first examine  m = N = 2, which leads to the BVL2 linear prediction model: 

        w = Nt + (1- N)/2 , with N = 2, yields:   w = 2t + (1 - 2)/2.  or, simplifying: 

BVL2:        w  =    2t - 1/2  .  (for m = N = 2)    Note that this is linear in his preferred variable, "t".

Or,  expressing in terms of runs y and x,  with  t  =  y / (y + x) ,  
w  =  (-1/2) +  2y/( x + y) .  Putting it all over an LCD, this becomes, for m = N = 2 : 

BVL2:         w =  (3y - x) / (2x + 2y)  This is not how BVL puts it, but it's an equivalent and simple formula.  Of course, it's no longer linear!

Derivation of this formula from Quality measures:
 

Let YQuality be    y +  (y - x)/2   ,  with XQuality therefore     x + (x - y)/2  . 

Intuitively, this says a team's quality is proportional to the following sum:  the runs y they score, plus half the "s" marginal runs by which they exceed (or fall below) their opponents' runs.  This is certainly a plausible indicator for the relative quality of a team--it includes (not just) the runs they score, but also an additional term proportional to how much their runs exceeded their opponent's runs, where the proportionality constant is 1/2.  

Plugging in to the Basic Axiom,  we get:          w  =  [ y + (y - x)/2 ]  /  [  y + (y - x)/2  +  x + (x - y)/2  ]   Multiplying by 2 top and bottom, we get:

 w =  (3y - x) / (2x + 2y)  ,  QED  . 

4B)  Back to BVLN, as above:  Now let N be any arbitrary exponent.  And replace t with  y/(y + x): 

BVLN:         w  =   N [y/(y + x)]  + (1 - N)/2      or, simplifying,   w = Ny / (y + x)  +  (1 - N) / 2  

         which, using the LCD,  gives us:                      

        w  =   (2Ny + y + x - Ny - Nx) / 2(y + x),   or,  simplifying for a couple steps...      w =  [y + x +  N(y - x) ] / 2(y + x)  ,  or: 

BVLN:     w = 1/2  +  (N/2) (y - x) / (y + x)   =     1/2  +  (N/2) v 

Here is our first natural occurrence of the parameter   v  =  (y - x)/(y + x). 

Note that BVL2 in this form becomes:   w = 1/2  +  (y - x) / (y + x)  ,    or    w = 1/2 + v .   A very simple (linear) form, and pretty accurate!

Derivation of BVLN from Quality measures: 

Let Qy  =  y +  [ (N - 1)/2 ] (y - x) ,   with   Qx  =  x +  [ (N - 1)/2 ] (x - y) 

Intuitively, this again says that a team Y's Quality can be measured by the following sum:  the runs y they score, plus a proportionality constant times the "s" marginal runs (y - x) by which they exceed their opponents runs.  

Here, the "general" proportionality constant is (N - 1)/2, instead of 1/2, as in part A), and thus N can be adjusted here to "best fit" the real-life data.  

BVL also shows that whatever N fits the data will also be the exponent in the Pyth. Run Formula that also best fits the data for that model--via a calculus tangent-line relationship, which will be discussed below.  In fact, the "best fit" to real life data says that N should equal around 1.8, not 2.   
Substituting the "general" Qy and Qx above into the Basic Axiom, we get: 

w =  ( y +  [(N - 1)/2] (y - x) )  /  [ ( y +  [ (N - 1)/2 ] (y - x) )  +   ( x +  [ (N - 1)/2 ] (x - y) ) ]  
Multiplying by 2  top and bottom, and expanding, we get: 
 w  =  [ 2y + (N - 1)y - (N - 1)x  ]  / [ 2y + (N - 1)y - (N - 1)x   +   2x + (N - 1)x - (N - 1)y ]  , or
 w  =  [2y + Ny - y + (1 - N)x ] / [2y + Ny - y - Nx  + x  +  2x + Nx - x - Ny + y ]
Adding like terms in the top and bottom, we get:  w  =  [ (1 + N) y + (1 - N) x ] / [ 2y + 2x ]  , which then simplifies to BVLN in a couple simple steps.  QED.

 _________________________________


5.  General characteristics of Power Run Formulas, including the Pythagorean.  

A natural model for wpe's is one in which it is the ratio of the runs, either u = y/x, or, if more elegance results, 
q = x/y , that is used as a parameter.  I am indeed going to use q = x/y to get the additional elegance. 

So we look for wpe's of the form w = f(q).    The smaller the value of q, the more games the team should win.  

Once one decides to use  q  as the variable, there is only one basic family of estimator functions w = f(q)  that fully satisfies natural constraints on estimators of baseball winning percentages.  There are two basic constraints: first, that  f(1) must equal 1/2 (if a team scores and gives up the same number of runs, it should win half its games and lose half, on average.) 

And secondly that, since by definition the team's winning percentage equals ( 1 - its opponents' losing percentage), and since its opponents have the roles of y and x reversed for them, we must have f(1/q) = 1 - f(q).  

The basic family of functions that satisfy these constraints is:  
          Winning percentage w =  f(q)  =  1 / (1 + q^n ).  
Here, to be realistic for baseball scenarios, we'll limit ourselves to n > 0.    
[Note: if we use u = y/x, we get w = f(u) = u^n / (1 + u^n), slightly less elegant than the q-version.  No real difference in approach or results, of course.]

This (or the u-version, of course) also simplifies, using y and x, to  w = y^n / (y^n + x^n).  

When the exponent n = 2, we get the "original" version Pyth2 proposed by Bill James in the 1970's,  

       w  =  y^2 / (y^2 + x^2), which, because of a superficial similarity to the (real, mathematical) Pythagorean Theorem, he dubbed by the name "Pythagorean Run Estimator (or Formula)".  

It soon became apparent that other exponents besides n = 2 could be used, and empirical tests have suggested that the most accurate (statistically) predictive n-value could be (depending on the data sample) any of various exponents from the range of roughly n = 1.7 to n = 2, with most being close to 1.8 or so. 

I will call any such function w =  f(q)  =  1 / (1 + q^n ) a "Power Run Function", or PRF.  Many baseball analysts are very interested in the "best" exponent n, the most accurately predictive one.
  
BUT, I note that in fact, it is true that any convex linear combination of PRF's, even if they each individually have a different n-value, is still a "generalized" PRF, in the sense that it still obeys the 2 main constraints on winning percentages.  That is, if you multiply a bunch of PRF's by various constants, making sure that you use constants that sum to a total of 1 (the "convexity" requirement), and add the results together, the resulting "sum" function still obeys the constraints, and hence could be a candidate for the "best" estimator PRF.  Thus, the "really" most accurate estimator may be some convex combination of, say 19 different PRF's!  

One reason that there are so many different wpe's is simply that if one graphs w =  f(q)  =  1 / (1 + q^n ) on a (q, w) set of coordinate axes, you get a curve that has many other curves that approximate it closely, at least within the range of q-values that obtain for the vast majority of MLB teams.  These include tangent lines, but also many other functions that "happen" to agree with the curve (approximately).  Many of these are obviously NOT plausible candidates for measures of a baseball team's "real" quality--but some are.  We'll explore tangent lines in a later section.
_________________________________


6.  The Uniform Run Distribution:  DTE to Kross to BVL to  Quasi-Pythagorean Results

Before I discovered the above reason "why" the Pythagorean Run Estimator Pyth2 is true, based on the quality model, I tried many other approaches. Here is one that "almost" worked, and based again on a very simple model.  

6A)  Uniform Run Distribution.  
Assume again that Team Yukon scores an average of Y runs per game, and gives up X to its opponents, Team Xavier.  Here, X and Y are constants for the purposes of the rest of the inquiry, hence they are capital letters, as we will need to use small y and small x for their routine mathematical role as "variable" coordinates in the usual Cartesian plane.  

Further assume that the range of runs scored per game is twice the average, that is, Yukon scores from 0 runs per game to 2Y runs per game in any individual game.  Assume the similar range for Xavier, from 0 to 2X. This is a fairly good rough statistical assumption.  

Further assume a "uniform" distribution of run results for each team in each range (not a normal or bell-shaped one).  That is, assume that the probability of any individual game's result equaling a number of runs for each team is the same for any number in the range 0 to 2X and 0 to 2Y.   This is NOT a very good assumption, as most scores will be in the middle of the range, but it is certainly a simple one, and its inadequacies tend to cancel out for our purposes, given that its errors are in the same direction for both teams. Further assume a continuous distribution of possible runs, not a discrete one--this is an assumption that is necessary to keep the simplicity of the model as well, and shouldn't distort the model too much (certainly not compared to the distortions of uniformity!  :) ]

Under all these assumptions, one can model the game results by a rectangle in the Cartesian plane with corner coordinates (0,0), (0, 2Y), (2X, 2Y), and (2X, 0), in which a game result is any point (y,x) in the rectangle, with y being the number of runs scored in the game by Yukon, and x being those scored by Xavier (i.e., allowed by Yukon.)  

We first note that in this simple model, the probability that Yukon beats Xavier, i.e. its winning percentage w,  is simply the area of the portion of the rectangle that is above the line y = x, divided by the total rectangle area 4xy.  

Assume WLOG that y > x, and elementary algebra shows that the area above y = x  is:  4xy - 2x^2.  Dividing by the total area 4xy and simplifying gives us:   

    w =  1 - x/(2y) .  This is a mediocre winning percentge estimator and, I found, had been discovered as a formula by Bill James long ago (not sure how his conceptualization went), by whom it was named:
                                                                                                                                                                                                                                                   
 "DTE", or Double the Edge:       w =  1 - x/(2y)   =   (2y - x)/(2y)                               rmse =  .0307

What is interesting is that it arises from such a simple (too simple) model as the Uniform run distribution.

I prefer to express it as a function of q, where q = x/y, the ratio of the runs given up to the runs scored.  

So we have:   DTE:   w  =  1 - (1/2)q  .  We thus note that it is linear in q, linearity being of course a nice feature (it's simple) at the expense, usually, and here, definitely, of some accuracy as an estimator.  But it's very accurate for values of q reasonably close to 1, which is true for most teams.  The reason for this accuracy I soon discovered (as many had before me):  DTE is actually the tangent line (in the calculus sense) of the Pythagorean Run Formula w(q) = 1/(1 + q^2) at the point q = 1 (a .500 team.)  We'll talk about that later.

6B)  A Problem Corrected  
Note, though, that DTE satisfies only one of the 2 natural constraints on functions f(q) that are estimators of w:  
f(1) = 1/2 is satisfied, but f(1/q) = 1 - f(q) is NOT satisfied.  This is inevitable, as no linear function of q can satisfy that second constraint.  

However, as Bill Kross discovered years ago, there is a simple and elegant way to deal with that.  I don't know if Kross conceptualized it the way I do, but he certainly came up with the formula somehow:  
  
Simply use w = f(q) = DTE for values of q where q > 1, (i.e., when y > x)

And then
, for values where q < 1, that is, y < x, DEFINE a new function h(q) to equal 1 - f(1/q).  
Simplifying, we find that for q < 1,   w = h(q)  =  1/(2q),
   Or, using x and y,   w = y/(2x) .    I call this Kross2, using Kross1 for DTE.

This creates a "piecewise" function, with eac piece having a different domain, which I call simply "Kross", made up of Kross1 and Kross2:

Kross:    DTE =  Kross1:      y > x:    w  =  1 - x/(2y)    =   (2y - x)/(2y)     OR:   q > 1:    w = 1 - (1/2)q     DTE------------rmse  =  .0307  
                         Kross2:      y < x:    w  =   y/(2x)                                  OR:   q < 1:    w = 1/(2q)         Kross2--------rmse  =  .0312 (mediocre again)


This forces the second constraint to now also be satisfied, at the expense of now having a piecewise function, with one half being the linear f(q) , and the other half the hyperbolic g(q) = 1/2q.  This is a very nice example of a piecewise function, because the transition between the two is "seamless", with both having the same slope at q = 1 (and each being a fairly good approximation of Pyth2, w = 1/(1 + q^2) on their respective domains. 

Of course, both parts give a value of 1/2 at q = 1, as they should.  The nice thing, though, is that for large values of q = x/y, which is to say teams that give up a lot more runs than the small (but non-zero) number they score, the winning percentage does not become exactly zero (or negative!), as it would with DTE, but as it would NOT in real life.  I don't have my spreadsheet set up to evaluate Kross by using the different pieces as appropriate for y < x or y > x, so I can't tell how accurate it is, but it is certainly likely to be a little more accurate than either piece separately.

But, I thought, why not get a single function from the 2 pieces?  That's what I did  by simply averaging them, in:

      KrossAvg:   w = (1/2) [  1 -  x / (2y)  +  y / (2x) ]           rmse = .0283

This turns out to be a fairly accurate estimator, substantially better than either piece separately.:     It has some other versions, such as :

    KrossAvg:      w =  (1/2) + (y - x) M, where M) is a "winning percentage per run" estimator (see section 9), with M =  (y + x)/ 4xy  

6C)  But in fact, it is better to "average them" incorrectly!  
Given two fractions a/b and c/d, the "false average" (also the "naive way to add fractions") is    (a + c) / (b + d).  This is always between the original two fractions, like the true average, and equals the true average if b = d.  

If we use this on Kross1 (DTE) and Kross2, and writing Kross1 as (2y - x)/(2y),  with Kross2 as  y/(2x), we get that the "false average" is 
       (2 y - x + y ) / (2y + 2x)  ,  or  (3y - x)/ (2y + 2x)  That is, it is (once again)  BVL2!  

???  Here?  NEW:   DTE:        w =  1/2 + (y - x)/2y
                      Kross2:      w = 1/2 + (y - x)/2x
                      BJMWP:    w = 1/2 + (y - x)/2A
                        BVL2:       w = 1/2 + (y - x)/(y + x)

        

6D)  Can We Get Pythagorean Results?  
However, my result of DTE from the uniform model wasn't Pythagorean, so I fiddled with the uniform model a bit by omitting the central diagonal region between the lines y = x - 1 and y = x + 1.  This omits all points where the run differential between winner and loser is less than one--and you can't win a ballgame by less than 1 run.  So we only consider the parts of the rectangle that are NOT omitted, i.e, the "realistic" games, and now find the ratio of the remaining "winning" area to the total remaining area.

I first made the additional assumption:  Y - X  <  1/2, (while still assuming Y > X).  (If this additional assumption is not true, one gets a different geometry than if it is, so I first checked this case.)  When you calculate the ratio of the areas as described above, i.e., that of the "winning" area above the line y = x + 1,  to the sum of that area and the "losing" area below y = x - 1, you get the following formula (by elementary analytic geometry):

    w =  f(p) =  1 / (1 + p^2 ) .   Which is the Pythagorean Power Run Formula with n = 2, just as hoped(!??)    

Except...what is that "p" doing in there...weren't we using "q" as the variable!?

Yes we were--p is not (quite) the same as q, and so things aren't perfect--but they are interesting!  It turns out that under this modification of omitting the central diagonal from the model, if one is willing to "translate" (in the mathematical sense--i.e., shift) the average runs per game values of Y and X each down by a distance of 1/2, one then gets from the uniform distribution  the Pythagorean Run Formula as a function of the shifted ratio
"p" = (X - 1/2) / (Y - 1/2)
.  

That is, if one uses as one's variable not q = X/Y, but "p" = (X - 1/2) / (Y - 1/2), one gets the formula above as   w = f(p).  
It is a "perfect" Pythagorean Power Run Formula, except as a function of a "shifted" variable p, equal to the ratio of (X - 1/2) to (Y - 1/2), instead of q, the ratio of  X to Y.  

I am naming this the Shifted Pythagorean Run (SPR) formula.   One can of course change back to X and Y if you want, yielding: 
With   p  =  (X - 1/2) / (Y - 1/2)  ,
 
   SPR:  w  =  f(p) = 1 / (1 + p^2 )  =  ... simplifying... =  (Y - .5)^2 / [ (Y - .5)^2 + (X - .5)^2 ] 

 Again, this is perfectly "Pythagorean" in form, just using the "shifted" run averages.

6E)  Inaccuracy leads to better accuracy:

SO...how accurate is the SPR formula?  Not as accurate as Pyth 2  [SPR has rmse = 0.031 ] !

Oh, well...definitely worse than Pyth 2.  But, like all good failures, it suggests an improvement--one which is better than Pyth 2!

The reason it does worse than Pyth 2 is that, as we saw earlier, Pyth 2 predicts results (based on a given run ratio q) without sufficiently taking into account the effect of chance, which moves w a tad  closer to .500 than would be predicted by Pyth 2.  That effect means that a run ratio value a little closer to 1 predicts more accurately than the actual run ratio does.  But subtracting ½ from X and Y moves the run ratio farther away from 1, not closer—that is, it further reduces the role of “chance”, and is hence even less predictive. 

But why not move in the opposite direction?  Why not increase Y and X, which moves the q ratio closer to 1, and should thus account more for chance?  Hence, let's use X+ 1/2 and Y + 1/2, and see if we get better predictive results:  and we do!  [Of course, this is now an “ad hoc” adjustment, not derived from my model above—but it’s an interesting one, and is very accurate!] 

Also, let's no longer treat Y and X as fixed runs per game, but as variables, and as total runs per team--PRF's have no concern for which variable is used, total runs vs. runs/G, as the G’s cancel out.  So we get the much more accurate new estimator:   

with  y = total runs in a season for team Y, and x = total runs in a season allowed by team Y,

6F)  Pyth+1/2:       w  =   (y/G + .5)^2 / [ (y/G + .5)^2 + (x/G + .5)^2 ]    [Rmse =  .0264] 

Actually, we can get it just slightly more accurate using a best fit of adding 0.43, not 0.5 , giving me a rmse of  .0256 , which is extremely accurate! 

What is interesting about SPR and Pyth+1/2 is that they come from or are suggested by a VERY simple run model, quite unrealistically so in the "uniformity" feature, that is totally unrelated to the "quality' derivation of Pyth2, and yet yields the same structural Pythagorean result with exponent n = 2, and, in one case, extremely good accuracy, even better than Pyth1.82.   

6G  A minor footnote:  What happens under the contrary assumption to that in part 6d) above?  That is, what happens if we assume that y - x > 1/2 ?  The geometry changes a bit, and one gets a slightly different result, with an extra "adjustment" term added to the numerator and denominator of  SPR.  That extra term is (y - x)^2.  Since this involves the surplus s = y - x, but not the ratio of y/x, (or of their shifted values) this is no longer easily expressible as a function of some "p" variable.  And it is definitely a bit more complicated--although for most real-life values this extra term is quite small, and thus the final results are quite similar to the original f(p).  But at any rate, this seems to have too high a ratio of complication to accuracy to pursue.

_________________________________

7.  DTE and Tangent Lines to Pythagorean Formulas:  Simpler derivations

Moving away from uniform distributions, let's go back to the original "q" ratio, with q = x/y (and back to small y and x as the total seasonal runs for the two teams.)  What is the relationship between DTE and Pythagoras?     

    I'll write DTE as    w =  1 - .5q                and Pythagoras as:    w = f(q) =  1 / (1 + q^2)  ,  

As mentioned above, DTE is the calculus tangent-line to the Pythagorean f(q).  I discovered this independently, but it had been found and put online (though with different, somewhat cumbersome notation) long before by "Patriot", and others.   Using q = x/y simplifies the process greatly:  

I'll actually show the general result for any Power Run Function (PRF)  with power n:       w = f(q) = 1 / (1 + q^n), as follows:

Take the derivative "f-prime of q"  = df/dq =  -n q^(n - 1) / (1 + q^n)^2 , and evaluate it at q = 1 (i.e., for an average team, where y = x).  

We get f-prime of (1) = -n/4 = slope of tangent line at q = 1.  

Since w = f(1) = .5, we get the point-slope tangent line as:  w - .5 = -n/4 (q - 1), or simplifying ,

Tangent line to PRF f(q) for some exponent n,  at q = 1 is:   w = (1/2 + n/4) - (n/4) q .  

If n = 2, as in the Pyth2 original Pythagorean formula of BJ,  this immediately reduces to DTE,  w =  1 - (1/2)q .  Q.E.D.

If n = 1.82,  we get   w  =  .955 - .455 q,   or    w = .955 - .455x/y.   This should be the "best fit" linear wpe based on q.  

RMSE = ??? xxxx

If one graphs w = f(q) on a (q,w) coordinate graph, along with DTE, one immediately sees that the tangent line DTE to f(q) is very close to the curve f(q) for values of q near 1, which are most realistic baseball values.  This is why DTE is actually a quite good estimator, very similar to the Pythagorean f(q) for most situations.  

7A)  DTE:  A Quality model      w =  1 - (1/2)x/y,    or     w = (2y - x) / 2y
 
Derivation of this formula using Quality:    this has actually already been done earlier in section 3A, which deals with

Bill James  Marginal Winning Percentage:   
  (BJMWP)      w  =  [y - (1/2)A  +  (3/2)A - x ]  /  [2A].      A = League Average total Runs per team         

This simplifies to :   w =  (y - x + A) / (2A}.     

DTE simply replaces A by y  in the line above this one—i.e., it assumes that team Y actually scores an “Average” amount of runs. 
This says that the quality ratio between two teams that don’t score average numbers of runs is roughly the same as it would be if it happened to be true that the better one actually was an average run-scoring team.  That is, if Y scores 800 runs and gives up 700 in a league with an average of 720 runs scored, it is roughly as good as it would be if it performed the same in a league with an average of 800 runs scored.  This is quite plausible—it says that is really the surplus runs that constitute most of the quality, at least for a wide range of “normal” league average runs-scored values.  In a league with higher average runs, the y = 800 is less valuable to generate wins, but the x = 700 performance of the pitchers is now more valuable in compensation.  It all averages out…  Of course, there is a loss of accuracy from this simplifying assumption that A is equal to y—DTE is definitely less accurate than BJMWP—but the quality assumptions that derive the latter will also derive the former, when conjoined with the simplifying assumption. 

Why not assume that Y happened to actually give up the average A runs (rather than itself score A runs)?  That is, assume that x = A?  If so, plugging into BJMWP, replacing A by x, one immediately gets:

        w  =  y / (2x)  .  But this is just Kross2 !    See section 6B).  Hence, Kross2 also comes from the simple Quality model of BJMWP.

One can also obtain both DTE and Kross 2 from a "General Marginal Runs" formula" I've found by playing around with the above idea,  

     GMR:  
w  =  (y - A/2) / (y + x - A)  ,    RMSE:  XXX ???  choose a best fit A!  see end of sec. 3  =  BJA

in which if, reversing the substitutions we used above for BJMWP, we replace A by y we get Kross2, and if we instead replace A by x we get DTE.  
Note that GMRis is saying that the winning percentage is simply the ratio of the excess of runs scored by Y over half the league average, to the surplus of total runs in Y's games (scored and allowed) to half the league average (of total runs for both teams, which would be half of 2A, or A.)      

7B)  Tangent lines, q, u, v, t, Inflection points, etc.

There are a beautiful set of relationships between the various parameters in which wpe's are naturally expressed.  I'll recap the parameters, then show that BVL2, Pyth2, DTE, and Kross2 are each equal to the tangent line of the other three, when expressed in terms of one of the paramters. Thus, they are as intimately related as possible.  I note that the relationship below between q and u is of course trivial, since they are reciprocals, and we need only use both (instead of just one) because of a desire for more elegance in some formulas.

y = runs scored,     x = runs allowed,       w = winning percentage

                         x and y                 q                      u                      t                       v                    w = 1/2 when:       Range:

q  =                     x/y                                                         1/u                     (1 - t) / t              (1 + v) / (1 - v)                q = 1                 [0 , oo)
=                     y/x                           1/q                                                    t / (1-t)               (1 - v)/ (1 + v)                  u = 1                 [0 , oo)
=                   y/(y + x)                   1/(1 + q)                u/(1 + u)                                          (1/2) (1 + v)                     t = 1/2               [0, 1]  
v =               (y - x)/(y + x)            (1 - q)/(1 + q)        (u - 1)/(u + 1)           2t - 1                                                         v = 0                 [-1 , 1]
---------------------------------------------------------------------------------------------------------------------------------------------
Pyth2: w =   y^2/(y^2 + x^2)     1 / (1 + q^2)       u^2/ (1 + u^2)      t^2/(2t^2 - 2t + 1)     1/2 + v/(v^2 + 1)

BVL2: w =  (3y - x)/(2y + 2x)     (3 - q)/(2 + 2q)     (3u - 1)/(2u + 2)        2t - 1/2                     1/2 + v

DTE: w =    (2y - x) / (2y)             1 - q/2                   1 - 1/(2u)            (3t - 1)/(2t)               (3v + 1)/(2 + 2v)  

Kross2: w =     y / (2x)                    1/(2q)                        u/2                  t / (2 - 2t)                (1 + v)/(2 - 2v)

Note that all but Pyth2 are linear in at least one version:  BVL2 is linear in t and v,  DTE is linear in q, and Kross2 is linear in u.

And, amazingly, in each linear case, the linear formula is the (calculus) tangent line of the other three wpe's in its column!  
The demonstration of this is left as an exercise for anyone who has sucessfully completed a Calculus 1 course.
 Naturally, the tangent lines go through, and have their slopes (derivatives) evaluated at, the point where w = 1/2 for that particular formula, i.e., at q = u = 1, t = 1/2, or v = 0.

For example, for the "t" column, and for BVL2, the linear wpe:   w = 2t - 1/2 is the equation of the tangent line to Pyth2:   w =
 t^2/(2t^2 - 2t + 1)  , 
AND it is also the tangent line to DTE:  w = (3t - 1)/((2t),  AND it is also the tangent line to Kross 2:  w = t / (2 - 2t) .  
In each case, the tangent line goes through (1/2, 1/2) where we use t = 1/2 since that is the value that creates w = 1/2 for all the formulas.  And the slope (derivative) is evaluated at t = 1/2.  

The same is true for the "q" column DTE using the q formula is the tangent line to Kross2, BVL2, and Pyth 2 in their respective q-versions.
Here the derivatives are evaluated at q = 1, and the line goes through (1/2, 1)  Same in the next case for u = 1:
Ditto for the "u" column and Kross2:  it is the tangent line to the other three formulas in their u-versions.
And ditto for the "v" column, where BVL2 is once again the tangent line to the other three in their "v" versions.  Slopes evaluated at v = 0.
Note, of course, that Pyth 2 can't be a tangent line to any version, since it isn't linear in any of these paramters.  But all the linear versions are tangent lines to Pyth2!

More generality:  Each of these above formulas comes in a variation where in stead of using n = 2 in Pyth 2, we could use a "better fit" exponent n.  This is generally taken to be around n = 1.82 for best fit to real results.  I'm going to use capital N in the titles, but small "n"s in the formulas:  This creates 

BVLN, PythN, DTEN, and Kross2N:    For these, which are more complicated, I will give only those paramter's versions which are fairly simple:

PythN:  y^n/(y^n + x^n)     =                         1 / (1 + q^n)

BVLN:    [(n + 1)y - (n - 1)x] / (2x + 2y)                            =        (1 - n)/2 + nt                                          =  1/2 + (n/2) v


DTEN:   1/2 +            =  (n + 2) / 4  -  nq/4       












_________________________________


8.  Tangent Line Considerations for log5 

 _________________________________

9.  Wins Per Run Estimators (WPR's)       

xxxxTango Tiger , BVL, Palmerr 

This section is particularly indebted to Patriot, including his excellent work at ______ online, and I thank him for several gracious and helpful email exchanges.  My work merely reclassifies and extends his work a bit, as well as that of others he has cited (and I'm sure many he has not--it is very likely that that stuff I've discovered independently may have been discovered by others before me!). 

Wins per Run estimators (wpr's) usually rely on the concept of the surplus runs that your team scores over what its opponents score, i.e. on the quantity (y - x).  Called the "run differential" in much sabermetric literature, I prefer to label it surplus runs and, if needed for simplicity, to use as its parameter s = (y - x).  By itself, this parameter means little, since your team's surplus runs affect its ability to win very differently depending on the actual value of y and/or x, and hence of  (y + x), none of which are determined merely from merely knowing the result of (y - x).

General form of a wpr  is:        w = 1/2 + (y - x) (M) ,    where M is a function M(x, y) which may or may not be constant. 

Since any winning percentage "w" is of the form:  wins/G,  with G = total Games, there must be an implicit "G" in the denominator of M.  So M can be reconceptualized if needed as  M = Q/G.    
Here Q is equal to a Quantity (not necessarily constant) conceptualized as "Wins per Run", or more precisely, wins per surplus run, i.e., wins per [unit increment to the surplus (y - x) ].  In particular, Q may well depend on the actual size of x or y individually, and hence on x + y.   I note that formulas for M don't have to explicitly mention G or Q, though some do--they can instead simply use other parameters that happen to give accurate results when rolled into one big "M".  

Thus, we can write (or conceptualize) in general,   w  =  1/2 + (y - x) Q/G , noting that (y - x) Q  will be "surplus runs" times "Wins per surplus run", or "total (surplus) Wins" , over and above the wins that would result if there were no surplus runs, which are assumed to be 1/2 of the games G.  Hence, dividing (y - x)Q by G will turn those "surplus wins" into a "surplus winning percentage", to be added to the initial 1/2 that obtains when y – x = 0.  A simple and extremely fruitful model, since M = Q/G can take many plausible forms.   All of these plausible forms fit the Quality Model, as I will show below as a general proof, not reliant on any individual M.  But first, let's investigate the general size of M = Q/G:

It has been found that plausible and accurate models covering the history of baseball result from using M values or functions that average out to roughly 1/1500 or so, since the average Wins/Run  "Q"  factor has been around 1 / 9.5 or so, and Games have long been around 160 per season or so.  Using an M(x,y) based on individual team runs x and y (or League Average runs A) will of course create non-constant M-values with a fair amount of variation around the rough mean of 1/1500.  Non-constant M’s should of course fine-tune and increase the predictive accuracy. 

Why does it seem plausible that Wins/Run should be around 1 / 9.5 (or 1/10, for simplicity) and is the fact that major league total runs per game have often averaged around 9.5 or 10 related to that?  Here are two considerations that suggest it is plausible, and yes, they are related.  Dealing with the second issue first, since every game results in one Win (for some team) , then of course, overall, and on average, in any league Runs per Win would equal Runs per Game, since Wins equals Games for the league as a whole. However we are actually interested in Surplus runs per (Surplus) Wins, above Wins = ½ G, and it’s not clear that this necessarily is identical with Runs per Wins overall.  However, using Runs/Win = Runs/Game gives a pretty accurate wpr, so it’s obviously a reasonable rationale. 

Moreover, just in general, 1 extra win (above .500) from roughly 10 extra runs (above y = x) makes sense intuitively, since for half the games in which those extra 10 runs are scattered about, the team will already (on average) be a winner, so those extra runs (half, or 5, on average) won’t produce any extra wins.  In the 5 scattered among previously losing games, in the ones where only one of the extra runs was scattered, it wouldn’t produce a win (at most creating temporarily a tie, but then unable to be won without another run scattered into the same game.)  So there are actually fairly few games which are losses, and in which the results of randomly scattering 5 extra runs among them could actually make a sufficient difference, given the original margin of loss, to win an extra game.  Getting this to happen once seems just about right on average, though of course teams might get lucky or unlucky in where the extra runs came.  If you ran a simulation scattering the 10 runs among a teams games by any reasonable probability distribution, it’s extremely likely that the resulting average would indeed be around 1extra win—and, again, the nice accuracy of wpr’s using the rough value proves this.  So roughly 1/10  Wins/Run is certainly “in the ballpark”, intuitively.
-------------------------------------------------------------------------------------------------------------------

The simplest wpr of all:  I call it W1500:       w = 1/2 + (y - x)/1500     rmse = .0263   

And a pretty accurate one!  Here,  M =  1/1500.  This is slightly more accurate than Pyth2, and not far from Pyth1.82, which sort of makes you wonder why you need them:  it don't get no simpler than this formula, which actually is linear in s = (y - x).   1520 or so is actually the best fit for my sample, among all constant denominators, but the differences here a in the ten-thousandths place....  

Here are some previouisly familiar wpe’s recast into Wins per Run models, though most don't use Q or G, just the final result M that is a proxy for them.  

DTE:   w =  1/2 + (y - x) / (2y)      Here, M =  1/(2y)   Note that 1/(2y) for an average team in baseball over the years will indeed be in the vicinity of  1/1500.  Also note that his is not how DTE was written earlier, but simple algebra shows the equivalence.

Kross2:    w =  1/2 + (y - x) / (2x)    Note that M = 1/(2x) simply switches from its DTE value of 1/2y to 1/(2x), same rough size for an average team as for DTE.  Again, this is a new form of Kross2, but simple algebra reveals the equivalence.  It is no wonder Kross came up with these two parts of his piecewise function, DTE and Kross2, since they are mirror images--one using x, the other y, in the denominator.  Since y > x, one will always be an overestimate compared to the other, and naturally using an average of the two will give a middle value, less likely to be either an underestimate or overestimate.  Which is precisely what the next function uses:

BVL2:    w = 1/2 + (y - x)/(y + x)     This time we have  M = 1/(y + x), where the denominator is simply the average of 2x and 2y, the denominators of DTE and Kross2.  BVL2 is thus naturally more accurate than either of the first two individually, though maybe not of their piecewise totality in Kross.  And again, 1/(x + y) will be in the historical average neigborhood of 1/1500.  I note that Patriot attributes BVL2 online to David Smyth--I don't know who had priority, but it's a great little wpe (and wpr.)  However, it is a bit odd that BVL2, which fine-tunes the individual team run-contexts, is a bit less accurate than just using M = 1/1500...but maybe my sample isn't big enough...or maybe the game run-context isn't so important!

Are all wpr's "Linear"?   Well, sort of, but not really.   They equal a constant plus (y - x)M, which is apparently "linear" in the variable s = (y - x).  But this is not true if M is not a constant, but rather a function of x and y--then the result is certainly not "linear" in x and/or y, and not in (y - x) either, since that is not a parameter independent of x or y. 

But this is a good thing, because it turn out that Pyth2, which is NOT linear in x, y, q, u, t, or v, is nonetheless also a wpr!   

Pyth2:    w =  1/2 + (y - x) (y + x) / [ 2(y^2 + x^2) ]     Again, a new form, equivalent by algebra to the old one.  

Here, M = (y + x) / [ 2(y^2 + x^2) ].  This also brings up a minor quibble I have with Patriot's discussion online, because he says that "linear" wpr's, of which he lists DTE, Kross2, and BVL2, don't meet constraints on w, namely that w should be between 0 and 1.  But Pyth2 does meet this constraint, and is a "linear" wpr in his sense just as much as the others (though again I would not call any of them linear.)  The point is that M(x,y) can sometimes take a form that does allow the overall w to always be between 0 and 1--it may not, but it can and sometimes does. 

Is M here roughly in the neighborhood it should be?  Yes, set x = y, and M = 2y/(4y^2) = 1/2y, so on average, it's around 1/1500 or so. 

BJMWP:    w = 1/2 + (y - x) / (2A)   Yet another familiar wpe is a wpr.  M = 1/(2A).   Obviously, 2A for the league average team runs A each year is a better choice than either 2y or 2x in the denominator.  M has same rough value using 2A as 2x or 2y, and again is roughly 1/1500.  Ican't tell how accurate this is since I don't have A data, but using A= 760 as a constant, I get rmse = .0263, pretty accurate.and fine-tuning A's should make it even better.  This is thus probably at least as accurate as W1500.   

TT (Tango Tiger):    w = 1/2 + (y - x) (2) / [x + y + 10G]                   Rmse = .0256 

Here, M =   (2) / [x + y + 10G]   While this is a bit different, it can be obtained either from a fairly ad hoc Runs/Win formula…
Or, better,  as a "false average" of two previous M's, as follows:
The first is M = 1/(x + y) from BVL2. 
The second is simply to use Q = 1/10, a rough historical average (empirically) for  Wins/Run, (though 1/9.7 or so gives more accuracy in my wpr's),
and thus M = Q/G = 1/(10G).
If we "false average" these two M values by simply adding numerators and adding denominators, we get a new M = (1 + 1)/ (x + y + 10G) = the M used in TT.  (I love false averages!)   Note, TT didn't conceptualize it this way…  And the formula is very accurate…more so than either BVL2 or the 1/(10G) are individually--and both of those are pretty good.  As we see:

"10G":        w =  1/2 + (y - x) / (10G)  ??     rmse = .0265         [I get slightly better accuracy with 1/(9.7G):  rmse =   .0264] 

BVLN:     w = 1/2 + (y - x) N/2/(y + x)     OR:   Best Fit:  (N = 1.82):   w = 1/2 + 0.91 (y - x) /(y + x)    rmse = .0264

Here, M = 0.91/(y + x) , a "best fit" to the data, and the same as the best fit for PythN exponent, which also is 1.82.   Nice accuracy!

Note:  Recall that for any N, both BVLN and PythN have the same tangent line: namely, DTEN!  However, this is partially true merely by definition:  we define DTEN as "the tangent line of PythN"!  This is in their q = y/x forms, evaluated at q = 1.  But then we note that by that definition, DTEN has exactly the same structure as DTE, but with a different constant N/4 instead of 1/2.  And that it then turns out to be the tangent line of BVLN justifies the "definition" even more, since DTE (using N = 2) is the tangent line of BVL2.  

But right now we're dealing with (y - x) forms, not q-forms.  So let's go to:

DTEN:   w = 1/2 + (y - x) N/ (4y)      OR:  Best Fit:  (N = 1.82):      w = 1/2 + (y - x) /(2.2 y)     XXX RMSE???

Here, M = 1/(2.2y) , for the best fit. 

PythN:    w = 1/2 + (y^N - x^N)/ [2 (y^N + x^N) ]  Oh, darn…  Fixx...UP   use 1.0001y instead of y in  the two factors that could = 0., and maybe even in  the other denom. factor

Well, this is NOT a wpr--it is difficult to get the numerator, y^N - x^N to contain a factor of (y - x).  The factoring out of (y - x) works nicely for N = 2, as we saw above in Pyth2, or for any other whole number N (none of which are accurate), but NOT nicely for N = any NON-integer fraction or decimal, which is the general case for PythN. 

However, it is an IMPLICIT wpr, as  y^N - x^N  DOES "factor" as  (y - x) (…etc…), but the problem is that the remaining factor (…etc…) is an infinite series when N is not a whole number.  The series obviously converges (except when y = x), since Pyth N gives (in its normal form) perfectly good results, but the series will have no simple form for M, and won't converge for quite a while (i.e., will need lots of terms for accuracy.)  So this does not yield any "formula" that is worth working with in a wpr form.  Of course, PythN is still a great wpe in its other, simpler form,  w = y^N/(y^N + x^N).  And the form above shows that it can be formulated as 1/2  + something; but it can't be represented simply as a wpr--as a function of surplus runs, (y - x). 

Xxxx why not use M = 2/(x + y+ 2A), which is false average of 1/ x + y and 1/ 2A

Kross Average!
w =  (1/2) + (y - x) M, where M is a "winning percentage per run" estimator (see section 9), with M =  (y + x)/ 4xy
 

Do BVL@to1/2, add 140 in top, 280 in bottom,

Separate issue:  false average of 1/x + y and 1/1500, IS pretty good!  but with 1625, not 1500 -- BEST??!  Except for BVL a=br

_________________________________

10. Pythagoras to Natural Logarithms via Integration:  a New Run Estimator      

10A)  Let Pyth(n) be    w  = y^n / (y^n + x^n).  Let G = total Games played, and n be fixed at some power we find "best". 

To formulate a Wins/Run function, we fix x, assume y = x has created w = .500, i.e., wG = (1/2)G,  and then

solve the equation   wG = (1/2) G + 1  for  y,  and then for y - x.  
This tells us how many extra runs (y - x)  will create  one extra win, above .500, for the team
.

Doing so, we get   y^n / (y^n + x^n)   =  (G/2 + 1) / G,  or , cross multiplying and then collecting and factoring y^n

y^n  (1 - [G + 2]/[2G] )  =  x^n (G + 2)/(2G), or, dividing, simplifying, and taking the nth root,

 y =  x  ( [G+ 2]/[G - 2] ) ^ (1/n).

Therefore,  y - x  =  [ ( [G+ 2]/[G - 2] ) ^ (1/n) - 1 ]  x   =  (extra) runs / additional win .

Taking the reciprocal gives  (extra) Wins/Run  =    1 / (Kx),   where  K = ( [G+ 2]/[G - 2] ) ^ (1/n)  -  1.  

10B)  Now assume a team has scored y runs and given up x runs.  We ask how it accumulated extra wins during  the process of accumulating the excess (y - x) surplus runs, since the Wins/Run changed at each step of the way.  

That is, our assumption above was that y = x, when we found Wins/Run, but as each successive extra run was scored, there is a higher y, and hence an "assumed" higher x, which will influence the extent to which the next incremental win will come from the (new) level of x and y.. 

So what we really need to do is INTEGRATE the Wins/Run function at every level of runs, from the x-value that really was what the opponents scored, up to the y that our team actually scored, accumulating "dw"'s (win increments) along the way for the varying Wins/Runs occurring at each different y.  

To do this, since y and x are now fixed total runs scored, and can't play the role of a variable, 
we introduce a new variable p = current runs, and

Integrate  d[Wins/Run ]  = Integral of  [ 1 / (Kp) ] dp      from   p = "x"  to  p = "y".

Since integral of 1/p = ln(p), and substituting the limits of integration x and y

We get: total "extra" Wins =  (1/K) [ ln y - ln x ], with K as above.  Divide these wins by G to convert to winning percentage, and 

add this "extra w" to the w = .500 level when y started (at x), and we get a new run estimator:  

      LNW:    w = (1/2)  +  (1/KG) [ ln y - ln x ]  ,

where KG, for n = 1.82 (best Pyth n) and G around 160, yields  2.2 or so.    

       Best  LNW:     w = (1/2)  +  [ ln y - ln x ] / 2.2          Rmse = .0259

[Note:  4 / (KG) = the Pythagorean exponent n we started with.  This is not surprising when we look at tangent lines, as below.]

This is quite an accurate estimator, according to my sample data.  
Of course, since ln (y/x) = ln y - ln x, one could also write it using u = y/x  or q = x/y  as parameters:     

     w = (1/2)  +  [ ln u ] / 2.2  =   (1/2)  -  [ ln q ] / 2.2 

10C)   Tangent Line Considerations for Pythagoras and LNW:

When written as functions of q, both PythN and LNW have the same tangent line, and, indeed, the same Taylor polynomial expansions at q = 1 out to the square term, with further power terms being very similar as well.  This explains why LNW is such a good predictor, and why 4/(KG) = the Pythagorean exponent N we started with..

_________________________________

11.  L'Hopital Enters the Fray

In his discussion at this online source,  _______,  Patriot discusses how one might get Runs Per Win (and hence Wins per Run, WPR)  from Pyth(N) , as originally done by David Smyth.  This is, of course, somewhat artificial--PythN is already an estimator for how runs translate into wins, so any result derived from it can't be more accurate than the PythN from which it came.  Still, the mathematics are interesting, and the result could suggest other (simpler, or interesting) forms.

I'll recreate the process that leads to Smyth's result, (using my notation, not his), then analyze it (more simply than Patriot did) by using L'Hopital's rule from Calculus.

Set PythN with exponent N equal to a standard "Linear Run Differential" Estimator,
which is of the form  w = (1/2)  + [ (y - x)/G ] z  
, with z being the Wins Per Run we seek.  .
 
If  y^N/(y^N + x^N) =   (1/2)  + [ (y - x)/G ] z    is solved for z, we get, equivalently to Smyth:  

WPR = z  =  G(y^N - x^N) / [2 (y - x) (y^N + x^N)  ]
The standard place to evaluate this WPR is when y = x, i.e., at the .500 level.  But the function is 0/0 at that point, because of the (y^N - x^N) / (y - x) portion.

The calculus approach to finding what happens to that part at y = x is to take the Limit as y approaches x of (y^N - x^N) / (y - x), and, when that is 0/0, to use L'Hopitals' rule on the fraction.  L'H says the same limit is achieved from a new fraction P/Q, where P and Q are the derivatives of the top and denom. of the fraction, respectively.  Since I am letting y be the variable, here, x is treated as constant, with d/dy (x) = 0.

So we need to take d/dy of the top, (y^N - x^N) , and d/dy of the denom (y - x) .  The former is  Ny^(N-1) by the power rule, and the latter is 1.  (The derivatives of the "x" portions of both are of course 0.) Evaluating the former when y = x,
we get that the limit of that portion is N x ^(N-1), which we can substitute in for the expression (y^N - x^N) / (y - x) in our WPR above.

So, evaluated at y = x,  the WPR becomes:  WPR =  G [ Nx^(N - 1) ] / [2 (y^N + x^N)  ] evaluated at y = x, or G[ Nx^(N - 1) ] / [2 (2x^N)  ] , which simplifies to:
   WPR  =  NG/(4x)    

Substituting in the standard Linear Run Differential Estimator equation above, we get   w  =  (1/2) + [ (y - x)/G ] [ NG/(4x) ]  , or, simplifying,

   w  =  (1/2) + (N/4) (y - x) / x  

HOWEVER, there was nothing in the above process that forced us to use y as the variable and x as the constant when using L'Hopital.  Reversing the roles, we would get
    w  =  (1/2) + (N/4) (y - x) / y  .

Thus, we get 2 different wpe's. Neither is very accurate.  One overestimates w, the other underestimates it.  
BUT, obviously, one thing to try is to note that the first formula was gotten by evaluating at y = x, but if y = x, then x =  y = (y + x)/2 , as well.  And if y does not equal x, since there is no reason to prefer one to the other, we might as well replace both with their average, accomplishing the same thing.  Replacing the denominator x or y in either formula by (x + y)/2 yields the same single formula:    

w  =  (1/2) + N/2 (y - x) / (y + x)) , which is BVLN!  
We have come full circle...as Patriot also did, by a slightly different approach.

Or, even simpler, we can once again use the "false average approach" on the fractional parts of the two fractions that mis-estimate (one overestimating, the other underestimating).  This is what we did in part

 _________________________________

 
12.  Bill James's original derivation of log5 in 1981  
 

First developed in the late 70's, vaguely mentioned in his 1978 Baseball Abstract, and explicitly developed in his 1981 Abstract, BJ's (apparently misnamed, as it seems to have nothing to do with logarithms) Log5 formula is an extremely useful formula, stemming from basic probablility theory, describing how often Team Y could be predicted to beat Team X in a series of head-to-head games, if the only the info we have about them is their Winning Percentages (or, alternately W/L odds ratios) against their respective entire leagues of opponents.  

That is to say, if Team Y meets Team X in the World Series, and we know that Team Y went 60-40 in the regular season in its league [winning percentage  = .600 = 60/(60 + 40) = Wins/(Wins + Losses)], while Team X went 55-45 (winning percentage  = .550) , what is the winning percentage  w  that we should "expect" in the World Series for Team Y as it plays Team X?  

Define a team Y's Quality Qy as Odds Ratio Wy/Ly, where both Wins Wy and Losses Ly were compiled against their entire League.  When it plays a different Team X in the World Series, X’s quality Qx is Wx/Lx, similarly defined.  (Though Y and X played against different leagues, we assume for simplicity that each league was of the same overall average quality.)  This is a very plausible definition of quality—the better a team, the better should be its odds of winning against the entire league overall.

Substituting in the Basic Axiom, we get

   w  =  Qy / (Qy + Qx)    =   Odds Ratio for Y /  (Odds Ratio for Y  +  Odd Ratio for X) 

Or:         Log 5 :    w  =  Wy/Ly   /   (Wy/Ly  +  Wx/Lx )    [Odds Ratio Version] 

BJ went through a method similar to mine to arrive at his conclusions back in 1981.  A discussion of this follows, but can be skipped by those who aren’t interested in his method. 

There are many different plausible models that lead to the same log5 result, which is the result BJ gave in his 1981 abstract.  But his stated method there is almost identical to my "Quality" model method in spirit.  So in a very real sense, Log5 is the original/quintessential "quality" model for all winning percentage estimators.  However, BJ did not  seemed to explicitly realize that the basic quality model can underpin much more than log5.  

BJ said the following in the 1981 Abstract:  Assume that Y as above has winning percentage j = .600 against its league opponents Z, where Z is "all the opposing teams that Y played against, as instantiated in whomever they chose to play in their games against Y only."  Again, Z is conceptualized as a vast "entire league" team Z that only plays certain of its players against Y in any given game. 

Further assume that an average team (roughly like Z here) has an arbitrary "quality" level Qz of 1/2.  Then ask, "What quality level Qy would Y have to have, in relation to Z's quality levelof 1/2, so that Y would win with a winning percentage j = .600 against Z, the league average team, under the assumption that in such a season series of games, Y and Z's wins are in proportion to the ratio of their respective Qualities?   

That is, he essentially asked, what quality level Qy would satisfy the Basic Axiom:  Qy / (Qy + Qz)  = .600,

where  Qz = 1/2 = .500  ?  [He asked this in English, and actually garbled (grammatically) the question, so it is technically inaccurate--but it is clear from the immediately subsequent math what he meant.]

If you substitute and solve   Qy / (Qy  + .500)  =  .600, you get by basic algebra Qy  = .5(.6/.4) .

James called this value the "log5" of team Y.  Note, however, that if he had arbitrarily assigned a "quality level" of 1 (instead of 1/2) to team X (the "league" opposition), he would have gotten log5 equals 1(.6/.4).  Thus it is clear that it is the .6/.4 that is the crucial result--he has it multiplied by .5 only because he assigned an "average" league team an arbitrary quality level of .5  In my calculations, I always assign an "average" team an arbitrary  quality measure of "1", if possible, because it simplifies the math.  I do that in what follows.  

So, for me, the "quality" of Y would in this case be simply Qy =  .6/.4 = 1.5, compared to an average team's  stipulated quality level of 1.  The fact that for James his "log5" was half of that was unimportant, and the extra "1/2"s quickly canceled out of log5 as used by others in the future.  Still, it is clear from his language that he explicitly conceptualized "log5" as a measure of "talent" (that is, quality), relative to an average team talent level. 

 _________________________________

13.  List of all wpe’s considered 

1)      Pyth2  (BJ):             w  =  y^2 / (y^2 + x^2)  

 1A)    Pyth(n)

 _________________________________

14)  Credits and Sources

http://saberlibrary.com/principles/converting-runs-to-wins/ 

http://www.baseballprospectus.com/article.php?articleid=169    Kushner?
http://gosu02.tripod.com/id69.html