Tuesday, December 22, 2020

A Taste of Victory

            One of the most bitter and contentious presidential elections in U.S. history is finally behind us.  It seemed that American democracy itself was under threat, and left voters on both sides of the political spectrum feeling jaded and demoralized.  There were certain features of this election that made it particularly unsavory – not the least of which was an incumbent president who was unwilling or unable to accept the outcome, and the cronies and lackeys of his who supported him in his delusion – and will probably result in its being remembered as one of America’s darker moments.  For me, it brought to mind the fictional election portrayed in the classic American western, The Man who Shot Liberty Valance.  In that movie, a young attorney from the east, named Ranse Stoddard, played by James Stewart, moves into an unnamed western territory and leads a movement to turn that territory into a state.  He almost immediately runs afoul of a local bully named Liberty Valance, well-portrayed by Lee Marvin, who is at the head of a faction of cattle barons opposed to statehood.  When Stoddard is elected as a delegate to the statehood convention, along with the local newspaper editor, who has published a story about some of Valance’s crimes, Valance beats the editor nearly to death, burns down the newspaper building, and challenges the attorney to a gunfight.  When Stoddard reluctantly faces off against him in a showdown, Valance is shot to death, much to the surprise of the local townsfolk, not to mention Stoddard himself.  Stoddard and his allies ultimately succeed in their drive for statehood, and he becomes a U.S. senator representing that state.

 



But aside from the ugly controversies surrounding it, the election did bring some longstanding criticisms about the voting process to light.  The Electoral College, in particular, in which each state is given a number of votes equivalent to the sum of its senators and representatives in Congress, came under renewed scrutiny.  A perennial complaint about the Electoral College is that it gives disproportionate power to the less populous states.  According to the 2010 Census, for example, California gets one electoral vote for each 700,000 persons in the state, while Wyoming gets one electoral vote for each 200,000 persons in the state.  There have been four elections in U.S. history when a president who won the Electoral College vote actually would have lost by popular vote, and two of these have occurred in the last twenty years (Al Gore vs. George W. Bush in 2000, and Hilary Clinton vs. Donald Trump in 2016).  Ironically, however, it can be proven mathematically that, under the Electoral College voting system, a single voter in a more populous state has a higher statistical probability of affecting the outcome of the election than a voter in a less populous state.  But in any case, the fact that this method potentially produces outcomes that differ from those which would occur with a simple count of the popular vote is very galling to many.

 

There are even more serious issues that arise, however, when there are more than two candidates running in an election, that even makes the outcome of a simple popular vote open to criticism.  These third-party candidates, if they are sufficiently popular, potentially become “spoilers” in the election, drawing away votes from one of the two majority party candidates with whom their views are most closely aligned.  Many Democrats worried that just such a thing might have happened in the most recent presidential election if Bernie Sanders had decided to run on a third party ticket.  While there have been many popular third-party candidates in American presidential races over the past century, (George Wallace in 1968, John Anderson in 1980, and Ralph Nader in 2000, 2004, and 2008), the one who is generally remembered as possibly having changed the outcome of an election is Ross Perot, who ran as a third-party candidate against incumbent George H.W. Bush and Bill Clinton in 1992.   Perot’s moderately conservative views were generally considered to be more aligned with Bush’s than Clinton’s, and therefore his strong showing (he received 19% of the popular vote) was thought by many to be responsible for President Bush’s failure to be re-elected.  While this conclusion has been debated (in spite of his strong showing in the popular vote, Perot received no votes in the Electoral College), Perot’s performance demonstrated the impact that a strong third-party candidate could have in elections where the winner is determined by a simple majority of votes.



            Some countries have tried to address the problem of selecting among multiple candidates by adopting more sophisticated voting methods, for example by allowing voters to rank candidates from most preferred to least preferred.  In a three-person contest for example, a voter’s first choice could be given 2 points, the second choice 1 point, and the third choice no points, and then the points would be totaled among all voters to determine the winner.  But even this method, as logical as it sounds, has been demonstrated to sometimes produce strange outcomes that seem to contradict the general will of the majority.  Consider, as a simple example, an election with three candidates, A, B, and C, and five voters.  Suppose that three of the voters, a majority of them, rank the candidates as follows (in descending order): C-A-B.  The other two voters rank the candidates: A-B-C.  Assigning 2 points to each of the first place votes, 1 point to each of the second place votes, no points for third place votes, and totaling, Candidate A gets 3 points for being the second choice of three voters (3×1), and 4 (2×2) points for being the first choice of two voters, for a total of 7 points.  Candidate B gets a total of 2 points (0 points from three voters and 1 point from two voters), and Candidate C gets a total of 6 points (2 points from three voters and 0 points from two voters), making Candidate A the winner, with the highest total of 7 points.  But three of the voters, a majority, had preferred Candidate C to Candidate A, which casts doubt on the reasonableness of selecting A as the winner.  Such paradoxes are not uncommon using this method, and in fact Nobel Prize-winning economist Kenneth Arrow proved that rank-ordering voting methods of this sort can never be devised to prevent these strange outcomes from occurring.

 


But two economists, Michel Balinski and Rida Laraki, in their 2007 book, Majority Judgment, make a compelling case for why a slight variant of the rank-order method actually can produce consistently valid outcomes.  Rather than using vote totals based on a preference ranking, the authors contend that a better method is to have each voter evaluate the total slate of candidates, with specific evaluations ranging from positive (approve) to negative (disapprove).  Evaluations for each candidate are then stacked from most favorable to least favorable, and the median (middle) evaluation is assigned as a rating to that candidate.  The candidate with the highest rating wins.  Consider the same example above, with Candidates A, B, and C, and that a top-choice ranking from voters is equivalent to an evaluation of “approve”, a bottom-choice ranking is equivalent to an evaluation of “disapprove”, and a second-choice ranking is considered “neutral” (i.e., neither “approve” nor “disapprove”).  Candidate A then received 3 “neutral” votes and 2 “approves”, giving it a median rating of “neutral”, since if we stacked these votes from most favorable to least favorable, then the evaluation in the middle of the stack would be one of the “neutrals”.  Similarly, Candidate B’s 3 votes of “disapprove” and 2 votes of neutral gives it a median rating of “disapprove”, and Candidate C’s 3 votes of “approve” and 2 votes of “disapprove”, gives it a median rating of “approve”.  Candidate C, then, has the highest rating among voters, with a median of “approve”, followed by Candidate A with “neutral” and Candidate B with “disapprove”.  The selection of Candidate C seems a more logical outcome in this election, since the majority of voters preferred Candidate C over both Candidate A and Candidate B. 


A Three-Way Election Outcome Using Majority Judgment Method










It is interesting to consider what would have happened if this method had been used in the most recent presidential election.  Suppose that the method proposed by the authors of Majority Judgment was used with the following five ratings available to voters (from best to worst): “strongly approve”, “approve”, “neutral”, “disapprove”, and “strongly disapprove”.  Given the extreme divisiveness that characterized this election, with voters for one of the candidates generally strongly detesting the other, it is not unlikely that the voters who selected Biden would have given him a “strongly approve” rating, while giving Trump a “strongly disapprove” rating, and Trump voters would have done the reverse: giving Trump a “strongly approve” rating and Biden a “strongly disapprove” rating.  Since Biden was preferred by a majority of the voters, he would have had more “strongly approve” ratings than “strongly disapprove” ratings, and his median rating would therefore be “strongly approve”, while, for the same reason, Trump’s median rating would be “strongly disapprove”.  These results, then, would have mirrored what actually happened in the Electoral College and the popular vote. 

 



But now suppose that Bernie Sanders had decided to run as a third-party candidate.  Under either the Electoral College system or the simple popular vote, the entry of Bernie Sanders into the race would have almost certainly siphoned off a significant number of votes from Joe Biden, and could very likely have resulted in the election victory going over to the incumbent President Trump.  However, the median rating approach would produce a distinctly different result.  Suppose that those who supported Biden over Bernie Sanders gave Biden a “strongly approve” rating and Sanders an “approve rating” while those who supported Sanders over Biden did the reverse.  Assume that both, however, still gave Trump a “strongly disapprove” rating.  Since Trump’s median rating would still then be “strongly disapprove”, he would again be the clear loser of the election.  The ultimate contest, then, would be between Biden and Sanders.  (Both of these candidates would probably now have a median rating of “approve”, suggesting a tie, but the authors of Majority Judgment provide a simple and elegant method for breaking ties.  In this case, the method would have selected as winner the more popular of these two candidates, Biden and Sanders.)

 

The voting method advocated in Majority Judgment is effective and suitable not only for elections, but for competitive activities that involve multiple judges, including sporting events such as the Olympics, and wine-tastings.  And it is to one of these that I would now like to turn, because as with our recent U.S. presidential election, the outcome was controversial, but unlike the election, it is remembered as a great moment in American history.  It was the famous “Judgment of Paris” wine competition of 1976, in which American wines were pitted against French wines in a blind tasting.

 

A little background is necessary in order to highlight the significance of this competition.  Before 1976, French wines were generally regarded as the best in the world.  And more than this, they were considered virtually unrivalled in their quality.  While some other European nations could lay claim to particular wines of excellence (Spain had its sherry, Portugal its port, Italy its Chianti, and Germany its sweet white wines, for example), the idea that any country beyond Europe could produce wine of any type that could even compare to France’s was unthinkable, even heretical, and this was especially true of wines produced in America – particularly in California, where many wines using grape varietals identical to France’s were produced.  And wine producers in California at the time even seemed to believe this themselves, as they often resorted to naming their white wines “Chablis”, red wines “Bordeaux”, and sparkling wines “Champagne”, which are all regions in France.  The French eventually raised a successful protest against this practice, as it was clearly a case of false advertising: what we might today even call “identity theft”.  The practice really did seem to be a tacit acknowledgement on the part of California wine-makers that their products were inferior imitations of the French bona fides.


The 1976 "Judgment of Paris"


A British wine merchant named Steve Spurrier decided to put this belief about the inferiority of California wines to the test.  (He was himself a believer, as he only sold French wines in his own shop.)  He arranged for a blind tasting in Paris involving several acclaimed judges which would include a red wine competition (four French Bordeaux wines against six California Cabernet Sauvignons) and a white wine competition (four French Burgundy and six California Chardonnays).  The tasting occurred on May 24, 1976.  There were eleven judges in all, including eight French, one Swiss, one American, and Spurrier himself.  When the tastings were completed, and the outcomes of the competitions were determined, it was revealed – much to the shock (if not horror) of the French judges – that a California wine had won first prize in both the red and white wine categories.  (The event is entertainingly portrayed in the 2008 movie Bottle Shock.)

 


While news of this competition and its outcome was downplayed in Europe, and particularly in France, the impact of the event was momentous.  By winning first prize, California winemakers had demonstrated that they could produce quality wines on a par with the vaunted French wines that were supposedly incomparable in their excellence.  The event virtually opened the floodgates to a vibrant international wine industry, as not just wineries in California, but others in both North and South America, as well as Australia and New Zealand, not to mention Europe itself, felt emboldened to challenge France’s winemaking dominance, in terms of popularity, or quality, or both.  It seemed that just knowing that such a thing was possible – a non-French wine winning first prize in a blind tasting – had a palpable impact on the industry.  In that first competition, most of the American red wines that had been part of the tasting actually did end up on the bottom of the ranking.  But when the Wine Spectator magazine hosted a France vs. U.S.A. tasting competition just ten years later, five of the six American red wines entered in the contest occupied the top five positions in the rankings.  It is a testament to the power of belief, and reminiscent of the famous story of Roger Bannister, who was the first person in history to run the mile in less than four minutes, in 1954, something which until then had been thought to be humanly impossible.  Within a year of his winning the record, 37 other runners also ran the mile in less than four minutes, and 300 runners within a year after that.  By just demonstrating that the feat could be accomplished, he made it far more achievable for others.  The 1976 “Judgment of Paris” and its outcome was truly legendary, and its legacy can be seen today in just about any store that sells wines, where there are aisles devoted to individual regions, with California Cabernet Sauvignons, Sauvignon Blancs, and Chardonnays proudly displayed, along with similarly-esteemed Argentinian Malbecs and red blends, German Rieslings, and the now globally popular Australian brand with the kangaroo on the label, featuring various varietals at a very affordable price.  French wines are still respected of course, and still popular, but no winery outside of France, in any region of the world, now feels compelled to try to pass off any of its products as a “Bordeaux”, or “Chablis”, or “Champagne” in order to gain respectability, or even popularity.

 


But here is where the story takes a bit of a left turn.  Did a California wine really win 1st Prize in the 1976 Judgment of Paris?  I have to return here to the authors of Majority Judgment, who had demonstrated that their method of determining election outcomes, and outcomes of competitions involving several judges, was superior to traditional methods, and devoid of all of the shortcomings attributed to them.  (Even Kenneth Arrow, the economist who had demonstrated that all traditional methods of rank-ordering candidates were flawed, endorsed the authors’ approach.)  In their book, they turn their attention to the Judgment of Paris, and in particular the red wine competition, and note that the outcome was determined by taking a simple average of each of the judges’ scores.  But by using their method, the authors contend that the American wine which supposedly won 1st prize, the 1973 vintage Stag’s Leap Cabernet Sauvignon, actually should have taken second place in the competition, with the real winner being the 1970 vintage French wine Chateau Mouton Rothschild, which had been given 2nd prize in the official ranking.  (Both the official ranking and the authors’ ranking concur that four of the six American wines entered in the competition occupied the four lowest positions, and that the remaining one came in 5th place.) 

 

The "Official" Outcome of the Judgment of Paris Red Wine Competition

This is a jarring conclusion, and leads to a profoundly different outcome for this 1976 event.  In fact, had this been the outcome that was officially observed at the time, it might have diminished or even eliminated the event’s historical significance.  After all, in the red wine category, five of the six American contestants ranked very poorly, or mediocre at best.  Hence, a 2nd place showing for Stag’s Leap might have been considered just an anomaly of no particular consequence.  For Americans, at least, this might make the authors’ voting methodology appear much less attractive.  (Some who dislike this outcome, and who are familiar with the book Majority Judgment, might even be tempted to observe that both of its authors were employed at a French university at the time when their book was published.)  But are the authors correct, nonetheless?

 

I have always been fascinated myself with the problem of how to properly rank order candidates or contestants based on a voting methodology, and have actively explored various approaches to this problem.  Several years ago, I came upon an insight which led to the development of my own methodology.  The insight was this:  In a competition that involves several judges, there are actually two types of information that are revealed in the judges’ scores.  The first, of course, is information about the things being judged, but the second is information about the caliber of the judges themselves.  If the particular scores of an individual judge tend to correlate highly with the average scores of the other judges, for example, then this is probably a good indication that the judge knows what he or she is doing.  For example, suppose three things are being evaluated – call them A, B, and C – by several judges, and Judge #1 determines that C is the best among the three, followed by B, followed by A.  If the collective ratings of the other judges, based on an average of their point scores, also puts C at the top, followed by B, and followed by A, then this suggests that Judge #1 has made a competent evaluation.  But if the collective ratings of the other judges does the reverse, putting A over B over C, then this suggests that Judge #1 either lacks the ability to discriminate effectively between the contestants, or has an aesthetic taste that runs counter to the population as a whole, or both.  It is also possible, of course, that Judge #1 is uniquely and exceptionally qualified to perform this role, and it is the rest of the judges that are incompetent rubes, but the former interpretation is far more likely, particularly if several judges are involved.  Hence, a judge whose individual scores are highly and positively correlated with the average scores of the other judges should get a higher weighting in the competition that is being evaluated, while any judge whose individual scores are either uncorrelated with the average of the others, or, worse, negatively correlated with the average, should be given a lower weighting, or perhaps should be disqualified entirely.  I have tested my method, using a technique involving multiple simulations called Monte Carlo analysis, and have found encouraging evidence that mine is superior to both the conventional method of simply averaging judges' scores and that which is proposed by the authors of Majority Judgment.

And when looking at how each of the individual judges’ ratings compared with the others at the Paris competition, one can make some interesting observations.  For example, one of the French judges, Pierre Tari, actually did exhibit a negative correlation between his wine ratings and those of the rest of the judges, meaning that he tended to rate highly those wines that the other judges were unimpressed with, and vice versa.  In his case, then, it really does appear that in spite of the fact that he actually owned a winery himself, as a wine connoisseur he was apparently in the wrong profession. 

 

(This may sound rather harsh, but I must confess that my own experience as a wine connoisseur parallels Monsieur Tari’s.  Many years ago I took a wine course, and it was customary to end each class with a blind tasting of the wines that had been featured that day.  I discovered that I invariably rated highly those wines that were disliked by the rest of the class, while panning the wines that were popular with everyone else.  It was very humbling at the time, but there was a consolation for me.  Because the wines that I preferred also tended to be the less expensive ones, I have been able to enjoy my favorite wines in the years since without it ever being a serious drain on my budget.)

 

Michel Dovaz and Pierre Tari


And three of the judges had ratings which, while positively correlated with those of the other judges, were only slightly so, suggesting that much of their ratings were not much better than assigning random scores.  In their case, this would be evidence of a palate that was not very discriminating.  One of these judges was the only American who participated, Patricia Gallagher, another was the British wine merchant who had proposed the contest, Steven Spurrier, and the third was Swiss wine instructor and author of books on wine, Michel Dovaz.  (Spurrier and Gallagher actually recused themselves from the competition and did not include their scores in the final tally, not because they felt that they were less than competent to judge, but because both had played an instrumental role in organizing the event.)

 


                    Aubert de Villaine and Jean-Claude Vrinat

But there were four judges, all French, that were standouts in a positive way, in that each of their ratings correlated strongly with the averages of the other judges, strongly suggesting that they had both a discriminating palate and a genuine cultivated taste for fine wines.  These were Claude Dubois-Millot, a restaurant sales director who was actually substituting for an absentee judge, economist and winery owner Aubert de Villaine,  restaurant owner Jean-Claude Vrinat, and Pierre Brejoux, Inspector General of the Appellation d'Origine Controlee Board, which oversees the production of the finest French wines.  Of these four judges, two put the California Stag’s Leap wine in a tie for second place, and two put it in a tie for third place.  Hence, while all of these judges agreed that Stag’s Leap was among the top four of the wines in the competition, they also were unanimous in determining that it was not the best of the contestants.  (By the way, Patricia Gallagher, the only American judge, rated Stag’s Leap even lower, putting it in a four-way tie for fourth place.)

 

(From left) Patricia Gallagher, Steven Spurrier, and Odette Kahn


The remaining three French judges exhibited positive correlations between their individual scores and those of the rest of the judges, but which were not very strong, indicating that while they had a general ability to discriminate between good wines and mediocre ones, it would perhaps be too flattering to refer to them as “connoisseurs”.  One of these, Odette Kahn, upon discovering to her horror that she had assigned first place to the American Stag’s Leap wine, demanded, unsuccessfully, to have her ballot returned.

 

When I use my own method of weighting the judges’ scores, I find that much depends on exactly how these scores are weighted.  With some weighting methods, my results support the contention that Stag’s Leap was the genuine winner, but with others, the 2nd place showing which the authors of Majority Judgment claims it actually deserved is supported.  In any case, the fact that the four apparently most competent judges were in agreement that Stag’s Leap was not the best wine in the competition casts serious doubt on the official outcome of the contest.

 

So the actual outcome of the famous “Judgment of Paris” may very well have been different than what is recorded in the history books.  It reminds me, again, of that movie, The Man Who Shot Liberty Valance, and the fateful gunfight in which the greenhorn, idealistic lawyer from the East Coast brought down a much more skilled opponent.  The showdown is remembered as a great, seminal moment in the history of the fledgling state, but during the movie (and, as a film critic would say at this point to those who haven’t seen the movie, “Spoiler Alert!”) it is revealed to a reporter many years later that it was not Ranse Stoddard’s bullet that actually killed Liberty Valance.

 


If you ever have a chance to come out to Washington, D.C., I recommend that you visit the Smithsonian Institution’s National Museum of American History, which is located on Constitution Avenue, NW, between 12th and 14th Streets.  In a permanent exhibition located on the first floor of the East Wing entitled “Food: Transforming the American Table”, you might still have an opportunity to see proudly on display an actual bottle of the Judgment of Paris “winner”, 1973 Stag’s Leap Cabernet Sauvignon, which was donated to the museum by the winery’s owner, William Winiarski.   I sometimes imagine myself standing there, admiring the exhibit, while some group is standing next to me, listening to someone describe the historic David vs. Goliath contest that allowed Stag’s Leap to bring respectability to American wines, and usher in a new era for wine in the entire world.  If that ever happens, I will simply smile silently, and nod in agreement, while remembering the most famous line in that movie, The Man Who Shot Liberty Valance, uttered by the reporter when he learns the truth about what actually happened during the climactic showdown between Stoddard and Valance:

 

“When the legend becomes fact, print the legend.”

 

I know that in these days of real and imagined incursions and accusations of “fake news”, this attitude of mine might be controversial, but I do believe that there will always be a place for myth in history, if the myth heals, unites, and inspires, rather than injures, divides, and enervates.  And so, in these divisive times, I offer a toast to the American ideal of unity in diversity, accomplishment in spite of adversity, and hope that in this ideal, “legend” will always become “fact”, and vice versa.  Or as the French would say, so simply and elegantly:

 

Je lève mon verre à la liberté!