Saturday, May 4, 2013

The Upset, Part I: Why do we Rank?

In derby, as in most other sports, there are multiple ranking schemes.  US College Football, or NCAA Football as it's commonly known, has 3 official rankings and nearly 150 unofficial ones.  European derby, with its 4, is tame by comparison.

Why so many?  One word: upsets.

Upsets, in the American usage, are games in which the expected winner loses to the expected loser.  They're games where the "underdog" wins, and to many sports fans one of the joys of watching sports.

We would expect Arsenal to win, but every so often Bradford City walk away with the victory.  It's a major source of excitement in any sport!

But what does that have to do with rankings?  Well, they mean that any ranking system cannot be perfect.  There will always be upsets, thus there will always be errors in the ranking.  Thus, ranking schemes need to be designed with priorities in mind.  

That is, a ranking scheme needs a purpose, a question to answer.  There are three such questions:
  1. Who did the best?  Who deserves the crown for best performance over the previous x time?
  2. Who will do the best?  Who will be expected to win in the coming games?
  3. Who is good competition?  Who will most likely give an exciting bout to a given team with minimal risk of a blow-out?
These must be different questions only because of upsets.  Each has different ways of dealing with that problem, because each has different rules defining how rankings may or may not be calculated.

1. A ranking for the purpose of awarding a crown has some of the more rigid rules.  If the crown is for best performance in a premier league season, for example, that ranking can only consider that season.  All teams start the season on 0 points, and the ranking shifts from there.

A good ranking for this purpose is highly retrodictive.  A retrodictive ranking is one that, over the course of the past period, has a minimum number of upsets.

In the European rankings, DerbyChart is entirely retrodictive with a limit of 12 months.  EuroDerby is entirely retrodictive within its divisions for a 12 month limit, with divisional placement based on the previous year's retrodictive ranking.  Thus, both seem designed to produce "the best performance for derby year xxxx."

2. A ranking for the purpose of prediction is much more free in its structure.  As the goal is only to forecast the future, rather than award for a given period, a predictive ranking can use scores from any previous period.  

In fact, a predictive ranking can use any factor, as long as the predictions do well.  Some baseball predictive rankings take transfers, market size, stadium size, team value, and all number of things into account.  If a scheme's predictions do well, then it's a good ranking.  Simple enough.

In the European rankings, the European Roller Derby Rankings and Flat Track Stats are predictive in nature.  Both consider all bouts since a team's debut, and the latter is explicitly designed with an algorithm based on prediction.

3. A ranking for the purpose of finding similarly-competitive teams is as free a structure as a predictive ranking, and often uses similar math.

In fact, the only difference between 2 and 3 is how the teams reading the rankings use them.  As an impartial observer reading algorithms, it is often to determine whether a ranking is designed for predictivity or competitivity. 

In the European rankings, the European Roller Derby Rankings' stated purpose is to allow teams to find opponents of similar skill.  EuroDerby can be easily used for this purpose as well, with it's divisional system.

Back to upsets.  Were it not for upsets, the three rankings would be identical.  If there were no improvement, all expectations of victory or defeat would be met.  This would be boring.

Instead, rankings have to deal with upsets.  An upset for a retrodictive ranking system is not always a problem; however, a retrodictive system should seek to minimize past upsets.  For a competitiveness ranking, it may not be a problem as well; if the ranking predicted a close bout and it was, the ranking has done its job even if the winner was not correct.  

A predictive system has the biggest problem with upsets, as they indicate that the original ranking was wrong.  Thus, a predictive system must react to upsets with some sort of correction to the ordering of teams.

So, how good are the various systems at being predictive and retrodictive?  How accurate are they?  Stay tuned for a detailed analysis of their performance, followed by a possible way of minimizing the number of upsets and maximizing the "correctness" of the ranking scheme.

No comments:

Post a Comment