Welcome to the URS™

Welcome to the homepage of the Universal Rating System (URS™).

The URS™ Rating system and the launch of this website represents the culmination of more than two years of detailed research and development and we are very pleased to be able to present the very first URS™ rating list for the month of January 2017.

Kindly explore the “About Us” and “FAQ’s” for a full explanation of what we do and why we are doing it. We welcome all constructive comments and suggestions that can help us improve our service offering moving forward.


The Universal Rating System – A Performance Rating Across All Time Controls

The URS™ rating algorithm was designed and developed by our research team, which consists of Mr. Maxime Rischard, Dr. J. Isaac Miller, Dr. Mark Glickman, and Mr. Jeff Sonas. The work has been funded for the last two years through a collaborative research project funded by the Grand Chess Tour, the Kasparov Chess Foundation, and the Chess Club and Scholastic Center of Saint Louis.

There are many differences between the URS™ and the FIDE Elo systems. The most striking difference is that the URS™ calculates only one rating for each player, informed by their results at all rates of play from Classical to Blitz (5 minutes per game). This published rating is the URS™ system's assessment of each player's strength at Classical chess (defined as a rate of play where each player has at least 2 hours for their first 60 moves).

Furthermore, the URS™ is a weighted performance rating, calculated across several years of previous game results for all players. Older games are given less importance than recent games, by applying an exponential decay rate. URS™ Ratings are calibrated so that they use a scale comparable to traditional Elo ratings. It is critical to note, however, that the URS™ does not incorporate Elo ratings anywhere within its actual calculation.


Comparison to Elo-based approach

A URS™ Rating is more like a performance rating than it is like an Elo rating.

In Elo systems, each player retains an Elo rating that is incrementally adjusted based on the results of any new games played. It is essential to know what the Elo ratings of the two players were at the time that a game between them was played, since each player’s Elo rating is used to calculate their expected score. This in turn directly impacts how their Elo rating is increased or decreased due to the actual result of each game.

In comparison, the URS™ involves computing ratings simultaneously over a substantial period. The only game information included in the URS™ calculation is who had White and Black, what the outcome was (win/loss/draw), when the game was played, and what the time controls were. It doesn't matter what each player’s rating was in the past, because there is no concept of a rating that is incrementally adjusted up or down.  

Instead of adjusting an existing rating, the URS™ simply includes the new games within the large pool of existing data that it analyses whenever it is time to calculate a rating list. The ratings of every single player in the pool are then recalculated, in what is essentially a complex performance rating calculation, using their entire pool of games within the database. In Statistics terminology, the URS™ involves a time-weighted regularized maximum likelihood calculation, an approach that has solid statistical foundations.

What is a Simultaneous Performance Rating?

Anyone who understands how performance ratings are calculated may now be wondering how it is possible to calculate a performance rating across a large pool of games while ignoring the opponents' pre-event ratings at the time of each game? The solution is to treat all the games as if they were played in one giant tournament and determine the ratings that are simultaneously most consistent with the game outcomes. This cannot be accomplished using a simple formula like the Elo updating formula. Instead, the computation needs to be iterative. This type of iterative procedure is well-established in applied mathematics. In fact, Arpad Elo suggested a particular instance of an iterative procedure decades ago when he developed his rating system. Under this scenario, we assume initially that everyone has the same rating (we'll call it "R") and we then calculate a Tournament Performance Rating (TPR) for every single player across this hypothetical tournament. Then once you have those TPR's for everyone, you recalculate a TPR for everyone, but this time, instead of using "R" as the rating for each opponent, you actually use each opponent’s latest TPR. And then you keep doing this, over and over.

In this way, each time you have an iteration of calculating a performance rating, you obtain a more self-consistent set of performance ratings, which in turn makes the next iteration of TPR even more self-consistent. If you do this for long enough, you will typically reach a stable equilibrium where the TPR's are changing by only negligible amounts from one iteration to the next. When you reach this point, each player's performance rating is consistent with the performance ratings of all their opponents. This can be called a "simultaneous performance rating".

Performance Ratings Explained

This brings us to a major point: what exactly do we mean when we say the URS™ is like a “performance rating”? In fact, there are lots of ways of calculating performance ratings. Some of the ways are simple, and some are far more complex.  

The simplest, most popular, and most easily understood method, is to first calculate the average rating of your opponents across the different games played in a tournament. You then convert your overall percentage score in those games into a rating advantage/disadvantage, and add that positive/negative number to the average rating of the opponents faced, in order to calculate the performance rating.

There are different variations of this calculation (e.g. removing the game against the weakest opponent and so forth) but they typically involve a relatively simple formula that allows you to calculate the performance rating directly. This is essentially how Jeff Sonas’s Chessmetrics simultaneous performance rating calculation worked, years ago. US Chess has a similar but more complex way of computing performance ratings that are the basis for determining provisional ratings. However, the URS™ took a more thorough approach.

There is another way of viewing performance rating, one that is not as easy to calculate using a direct formula. Under this approach, a performance rating is "the rating that would have led to the performed results". We can then consider several possible ratings for our player, assess what their overall expected score would be (with that rating) across all their games, and then pick the rating that yields an expected result that most closely matches what actually happened.

Rather than inventing a specific formula that can be used to calculate ratings directly, like there is for the Elo system and for performance ratings, the URS™ involves a probability model that analyses a large domain of possible ratings for each player, with some ratings being more likely than others (based upon the overall population distribution of chess strength). Across those possible ratings, our system then determines how likely the actual results would have been to occur, and ultimately determines the most likely overall set of ratings, for all players at once, in order to best explain the actual results.

Time weighting adjustments

In a traditional TPR, it makes sense to treat each game with equal importance, since all games are played at nearly the same time. When we extend the concept of a TPR to cover a much longer timeframe, as we have done with the URS™, then of course some of the games were played recently and others were played years ago. It is therefore logical to give the older games less weight than the newer games.

In the case of the URS™ we assign reduced importance to older games through the use of exponential-decay game weightings. The actual decay rate is one of the URS™ system’s parameters and affects how sharply or gradually the importance of older games is reduced. We currently calculate ratings across a six-year history of game results.

Rate of play adjustments

Finally, accounting for different time controls was incorporated into the URS™ in a more universal way than just classifying all time controls as either Blitz, Rapid, or Standard. For each event, we determined the maximum number of minutes each player could spend for their first 60 moves. We call this value "M60", where the "M" can be viewed as being an abbreviation for "Minutes".  

For some time controls it is easy to calculate M60. For example, for "Game in 5 minutes" or "Game in 90 minutes" the corresponding M60 values are 90 minutes and 5 minutes, respectively. This represents the maximum number of minutes each player could take for their first 60 moves (indeed for their whole game).

However, many time controls have delays or increments, where players receive additional thinking time either during, or after completing, each of their moves. In these cases, we assume the maximum time is taken, whether for increment or delay. Since we are looking for the maximum time that can be taken for 60 moves, it is convenient that increments and delays are typically expressed as N seconds per move, since this also means that it takes N minutes for the first 60 moves. Consequently, if we see something like "Game/5 min + 2 sec/move", we can just add 5+2=7, which means that our M60 value for this time control would be 7 minutes.

And finally, where there are time increments linked to a specific number of moves, then these are also counted (provided the increments start before move 60). Consequently, a time control like "40 Moves/90 min + Game/30 min + 30 sec/move" would have an M60 value of 150 minutes. This is calculated as 90 minutes for the first 40 moves, another 30 minutes (maximum) for the next 20 moves, and 30 minutes' worth of increments through to Move 60, so our value of M60 is 90+30+30=150 minutes. However, if the time control were "40 Moves/100 min + 20 Moves/50 min + Game/15 min + 30 sec/move", then the part about "Game/15 min" would not matter for the calculation of M60, since it doesn't apply until Move 60 has already been completed.

Thus you get the same value of M60=5 minutes for "Game/5 min" and for "Game/3 min + 2 sec/move", and our rating system therefore treats these time controls equivalently. Similarly, the two common time controls "Game/90 min + 30 sec/move" and "Game/120 min" are treated equivalently as M60=120 minutes.  

The URS™ uses the M60 values within continuous functions that model the variability of chess results across all rates of play from 5 minutes (blitz) up to 120+ minutes (classical). They are also used to calculate the degree to which individual players' quality and consistency of play degrades as the rate of play moves along the spectrum from classical to faster rates. Player-specific degrees of degradation in quality and consistency of play are expressed as their Rapid Gap (applying to M60=30 minutes) and their Blitz Gap (applying to M60=5 minutes) for each player. A larger Rapid/Blitz Gap means that the player’s quality and consistency of player degrades faster as they play at fast rates of play.

While optimizing the rating system, we determined the appropriate continuous functions to use for modelling the variability of results at any value of M60. Thus we treat "Game/41 min" slightly differently from "Game/42 min" and "Game/43 min", but there are no special considerations for these particular rates of play, just as there is nothing special about the treatment of "Game/9 min" versus "Game/10 min" versus "Game/11 min" in our system. They are all treated smoothly across the full spectrum of time controls. By contrast, "Game/9 min" versus "Game/10 min" versus "Game/11 min" are handled completely differently in the Elo system, as some of these games go into the Rapid rating system while others go into the Blitz rating system.

From the perspective of the URS™, the only special points on the spectrum of time controls are at M60=5 minutes and at M60=120 minutes. The URS™ currently does not rate any lightning/bullet results played at faster than "Game/5 min". And all games played at time controls of game in 120 minutes or more (i.e. all classical games) are treated as M60=120. Any slower time controls of play for games that can take longer than 2 hours for each player are thus treated equivalently to "Game/120 min" in the rating calculation.  

On the surface, it might seem like a bad idea to mix together a large number of rapid and blitz results with a relatively small number of classical results, when the ultimate goal is to calculate a rating that accurately measures classical chess skill. Indeed, the greater unpredictability of faster chess does mean there is less information to be learned from one rapid or blitz result than from one classical result.

Nevertheless, the URS™ recognizes that there is useful information about a player’s over-the-board strength in all game results regardless of the time limit, and can therefore more effectively estimate a player’s classical chess strength by also considering their results in games played at faster time controls. As the speed of play increases, the URS™ assigns less and less importance to the game results relative to games played at slower time controls. In this way, we gain useful information about players’ classical chess skill without being overwhelmed by the volatility or volume of rapid and blitz games.

We will be providing more details about the rating system in due course, but hopefully this provides an interesting and informative introduction for now.

We welcome all constructive comments that can help us improve moving forward.

Download the full press release here.

URS™: Universally Better Than Elo

We expect some people to challenge the notion that games played at slow time controls can be mixed together with faster games within a single rating system. One commonly-held (though admittedly subjective) belief is that classical chess is categorically different from rapid chess and even more different from blitz chess and the three types of chess ought to be kept separate.

There is another way to think about this, however. What if classical and rapid and blitz aren’t that different from each other? What if they all reveal information about a player’s universal chess ability, with the understanding that games become more chaotic and less informative as the rate of play speeds up?


If you accept this concept, then perhaps there is a way to effectively combine over-the-board games from all time controls into a single rating system, to use a single pool of data for analysis, and to create a single “universal” rating for each player. How could we tell, objectively rather than subjectively, whether this is a step in the right direction, or a step in the wrong direction?

If we believe that having three separate rating systems (and hence three separate ratings for each player) is a better approach than having one universal rating system (and one universal rating for each player), then wouldn’t that suggest that the FIDE Elo Standard ratings, calculated only from games played at slow time controls, are a purer and superior measure of playing strength at classical chess than a Universal Rating that has been tainted by games at faster time controls? Similarly, would we not expect that the FIDE Elo Rapid Ratings (calculated only from rapid games) are better at measuring players’ skill at rapid chess than that same Universal Rating which mixes the faster games with the slower games that supposedly require different skills for success? And the same for Blitz? How should we decide which ratings work better?

There are several ways to assess the accuracy of a rating system, but we propose as simple and straightforward a method as you could imagine. We asked one simple question…

"When a game ends in a decisive result (not a draw), did the higher-rated player or the lower-rated player win?"

If players’ ratings were completely random and bore absolutely no relationship to true chess strength, then exactly 50% of decisive games would be won by the higher-rated player. If, on the other hand, players’ ratings were perfectly accurate, then theoretically 100% of all decisive games would be won by the higher-rated player. While this is clearly an unattainable standard, 75% - 80% is a more reasonable goal, and we believed it was possible to design a rating system that would accurately predict the results of decisive games (discarding drawn results) at a better prediction rate than existing rating systems.

Once the models underlying the URS™ were built, we then decided to put our theory to the test. We started by retroactively calculating URS™ Ratings for the past several years on a month by month basis. This generated results which could be directly compared against the three sets of monthly FIDE Elo ratings to see which ratings (from the start of the month when the game was played) better predicted the outcome of decisive games.

We used the same set of URS™ ratings to determine the URS™ rating favorite in all games. On the other hand, we used the FIDE Standard ratings to determine the FIDE Elo rating favorite in standard games, and the FIDE Rapid ratings to determine the FIDE Elo rating favorite in rapid games, and the FIDE Blitz ratings to determine the FIDE Elo rating favorite in blitz games. Since the FIDE Rapid and Blitz rating systems only came into effect in 2012, we decided to give these ratings a one year grace period to settle, and we therefore started comparing results for all months between January 2013 and December 2016.

An illustrative example of the process that was followed is recreated below. This illustration is based on the results at the recently completed World Blitz Championships that were held in Doha from 29 – 30 December 2016.

For the sake of simplicity, we can look at just a partial cross-table which includes just the nine players who were rated 2800+ on the 1 December 2016 FIDE Blitz rating list. We would then sort these players both by their FIDE Blitz ratings and by their URS™ Ratings as of 1 December 2016. This generates the following two tables:

Comparing the tables shows clear differences. For example, GM Vladislav Artemiev was seeded ahead of GM Hikaru Nakamua based on their FIDE Elo Blitz ratings before the event but well behind Nakamura on the URS™ rating list.

Once the actual game results are available, we populate the cross-tables and compare the results. We simply ignore everything below and to the left of the diagonal line since this is a mirror image of the information in the top right. We also ignore drawn games and matchups where the players have identical ratings, since in these rare cases there are no “higher-rated” or “lower-rated” players.

This generates a table where anything shown as 1 in the area to the right and above the diagonal reflects a correct prediction, where the higher-rated player won. Anything that is a zero in this same area is a missed prediction. All of the cells we are disregarding, we have shown in gray, including the decisive results shown to the left and below the diagonal. The correct predictions (the “1” values) are shown in blue and the missed predictions (the “0” values) in red:

So when we use the FIDE Blitz Elo ratings, Magnus Carlsen’s two wins (against the lower-rated Maxime Vachier-Lagrave and Teimour Radjabov) were correct predictions while his loss to Sergey Karjakin (also lower-rated) represents a missed prediction. Overall there were four correct predictions and five misses, for an overall prediction rate (across this tiny sample of nine games) of 44%. Of particular note were Artemiev’s loss to the lower-rated Nakamura and Mamedyarov’s loss to the lower-rated Karjakin. Also note the extra “X” marks to remind us to disregard any Aronian-Nepomniachtchi and Karjakin-Radjabov results, where the players had the same FIDE ratings, or Nepomniachtchi-Vachier-Lagrave results, where the players had the same URS™ ratings.

When we do the same analysis using the URS™ ratings, the results are as follows:

From the URS™ perspective the Nakamura win over Artemiev represents a correct prediction, as does the win by Karjakin over Mamedyarov. So for this portion of the cross-table, the URS more successfully categorized the players, with a 67% prediction rate. While the dataset is clearly far too small to be drawing conclusions from, the example above should serve to illustrate how we can objectively compare the accuracy of two different rating lists that apply to the same games.

The results clearly only start having significance once we start looking at far larger data-sets. We consequently applied the same methodology to all four groups, and all players, at those recently completed World Rapid and Blitz Championships (Open Rapid, Open Blitz, Women’s Rapid and Women’s Blitz). We found that the URS™ ratings worked better than the FIDE Blitz ratings at predicting the blitz game results and also worked better than the FIDE Rapid ratings at predicting the rapid games.

Below is a high level summary of the results:

In the table above, the rightmost column has a color gradient applied so that numbers near zero are white, while more positive numbers are a deeper / darker blue, and negative numbers (had there been any) would have been red. The deeper blue colors illustrate where the superiority of the URS™ is more pronounced.

Still, that is only 1,667 decisive games. What if we cast a wider net and looked at more games? What if we looked at all blitz games, and all rapid games and all classical games, across the entire four-year period stretching from 2013 to 2016?

We did that and here are the results:

On a consistent basis, from year-to-year, and across all three rating categories, the URS™ rating engine consistently predicted the results better.

By now, you can probably see where we are going with this. Our findings indicate that that URS™ Ratings are better at identifying who is going to win a classical chess game than the FIDE Standard ratings. The (same) URS™ Ratings are better at identifying who is going to win a rapid chess game than the FIDE Rapid ratings, and the (same) URS™ Ratings are better at identifying who is going to win a blitz chess game than the FIDE Blitz ratings.

What does this say about the argument that the three types of chess should be kept in isolation within separate rating systems?

These results suggest that URS™ Ratings are, in fact, universally better than Elo ratings at identifying who is going to win a given game of chess. We would further consider this to be objective evidence in favor of the conclusion that ratings from the URS™ are more accurate across the spectrum of time controls than the Elo ratings from the separate rating lists maintained by FIDE.

From a statistical point of view, it is important to notice whether the results from 2016 were just as successful as those from 2013 - 2015. This is important, because when we optimized the inner workings of the URS™ in 2016, we adjusted a very small number of system parameters (approximately ten) to appropriate values. We did this using a statistical methodology that involved predicting the results of actual games played in the period from 2013 to 2015 and then seeing how well our rating system did at making the relevant predictions. The game result data from 2016 was only used as “out-of-sample” data, meaning that it was never run as part of any comparison exercise until we had completed our full and final rating system design. The behavior and results in 2016 can thus be viewed as being the final test. We will of course continue to monitor the behavior of the URS™ into 2017 and beyond.

The analysis above has only looked at overall numbers across the entire pool of players. However, perhaps the URS™ works well for one segment of the rating pool but not for all of it? For instance, the Elo system is known to work much better when players have a large game history, face each other often, and play more consistently. It therefore tends to function better for the top of the rating pool when compared to the entire pool.

Of course, the top of the rating pool includes only a tiny portion of the games played today. This is illustrated by the pie chart below which indicates the relative frequency of games played between players of different strengths, based on the FIDE standard rating of the lower-rated player in each game.

During the four-year period under consideration, there were barely 4,000 decisive games played where both players were rated 2600+. In fact, there were more than 600 decisive games played by lower-rated players, for every 1 decisive game played between 2600+ rated players. The slice is so small that you can barely see the blue slice marked as “a) Both players FIDE 2600+” in the upper-right of the chart.

We checked each of these ten groups of games, ranging from the elite games played among players 2600+, all the way down to games involving at least one player rated below 1400. We then compared how well the URS™ system did at predicting the winners of all the decisive games played when compared to the same players’ FIDE Standard ratings.

Regardless of whether you analyze the small slice representing the elite games, or the larger slice with the weakest players, or anywhere in between, the cells are all blue across the board. This means that at every level of player strength the URS™ better predicted the results than the applicable Elo ratings. At times, the results were only a little better, at other times they were significantly better, but they were never worse. Not in one single case.

And even though the URS™ is specifically optimized to measure a players’ strength at classical chess, it is in fact at rapid and blitz chess that the URS™ truly shows off its superiority. By including classical results within the ratings that are used to predict rapid and blitz games, we enable our rating system to make better predictions, up and down the rating list:

You may observe that even across four years of results, some of the columns are sparsely populated, having only a few thousand games. This is not actually that surprising when we consider how small the slices were for the highest-strength games, in the pie chart presented earlier in this article.

It may also prove interesting to do a more detailed check of player strength versus more specific rates of play, to see if there were any areas where in fact the FIDE Elo ratings were working better than the URS™ at predicting game results. To get sufficient data to look at this in two dimensions, we combined the strongest categories into one larger “Both rated 2000+” category so that we would have five roughly equal-sized groups of games. We could then see if there were any overall groups of players and particular time controls (or ranges of time controls) where the universal ratings were indeed inferior. The most obvious target would be the slowest time controls, for the strongest players, as that is generally the place where the Elo system works best. Games played at this level are typically less random and most players have stable strengths and face each other a lot. It was hence not surprising when it proved that this was indeed the place where the FIDE Elo ratings held up relatively best. Nevertheless, the cells remained consistently blue, with some areas deeper than others, suggesting strongly that the URS™ ratings are in fact universally superior to the FIDE Elo ratings at predicting game results:

Download the full press release here.

GCT announces 2017 wildcard selections

The Grand Chess Tour today announced their 2017 tour wildcard selections and confirmed that the following three players have been offered wildcards for the 2017 GCT Tour:

  1. GM Ian Nepomniachtchi (RUS)
  2. GM Sergey Karjakin, (RUS)
  3. GM Viswanathan Anand (IND)

GM Ian Nepomniachtchi earns his place due to his consistency across all time formats which sees him placed 5th on the URS™ rating list as at 1 January 2017. This earned him selection as the highest ranked player on the URS™ not already picked.

GM Sergei Karjakin was selected as the second highest ranked player on the URS™ not already picked after a year that saw him compete in the 2016 World Championship match and secure the title of World Blitz Champion.

The final wildcard was awarded to former World Champion Viswanathan Anand who is ranked 10th on the URS™ rating list as at 1 January 2017. He also tied for 4th place in the 2016 GCT tour despite only competing in three of the four events in 2016.

GM Levon Aronian was selected as the first alternate and will be invited to join the 2017 tour as a full tour member if any player declines to participate for any reason.

The three wildcard selections join the 6 automatic qualifiers who secured their spots based on their 2016 GCT results or through their average FIDE classical ratings over the course of the 2016 calendar year. The six automatic qualifiers for the 2017 GCT Tour are:

  1. GM Wesley So (USA) – Winner, 2016 GCT
  2. GM Hikaru Nakamura (USA) – Runner-Up, 2016 GCT
  3. GM Fabiano Caruana (USA) – 3rd place, 2016 GCT
  4. GM Magnus Carlsen (NOR) – 1st place, 2016 FIDE Average Rating
  5. GM Vladimir Kramnik (RUS) – 2nd place, 2016 FIDE Average Rating
  6. GM Maxime Vachier-Lagrave (FRA) – 3rd place, 2016 FIDE Average Rating

There will be fourteen event wildcards in the 2017 Grand Chess Tour with four in each Blitz & Rapid and one each in the Sinquefield Cup and London Chess Classic. The recipients will be announced in due course.

Download the full press release here.