Welcome to the URS™

Welcome to the homepage of the Universal Rating System (URS™).

The URS™ Rating system and the launch of this website represents the culmination of more than two years of detailed research and development and we are very pleased to be able to present the very first URS™ rating list for the month of January 2017.

Kindly explore the “About Us” and “FAQ’s” for a full explanation of what we do and why we are doing it. We welcome all constructive comments and suggestions that can help us improve our service offering moving forward.

Magnus leads by 59 points as Grischuk rises to #3 on August URS™ list

Magnus Carlsen’s dominant performance at the YNM Rapid in Leuven has seen his lead on top the URS™ rating list grow to a massive 59 points! After a busy month, GM Hikaru Nakamura is left as the only other player with a URS rating of 2800, while GM Alexander Grischuk has risen to a URS™ high of number 3!

+

Magnus saw his URS rating rise to 2859 after he dominated the GCT Rapid Event in Leuven and won it by a comfortable 3 point margin. His nearest rival Wesley So, can also be more satisfied with his performance in Leuven and he has also improved his ranking as a result to return to the top 5.

The most notable performance in July came from Alexander Grischuk however. He placed second in the Moscow Grand Prix before scoring 3.5/4 in the Chinese league and defeating GM Yu Yangi by a score of 3-1 in their individual match. His combined score of +7 over 17 games in the month of July has seen him continue his rise and he is now ranked as the 3rd best player in the World according to the URS™ list.

Teimour Radjabov, who won the FIDE Grand Prix event in Moscow, will also be satisfied this month and he has returned to the top 20 as a result of his excellent result.

It is also notable that GM Sergei Karjakin and GM Viswanathan Anand both remain just outside of the URS™ top 10 for the moment and both will be looking for strong performances in the 2017 Sinquefield Cup to help them get their year back on track.

Norway Chess and Paris GCT impact July URS Ratings

The Norway Chess tournament in Stavanger and the Paris GCT Rapid and Blitz tournament in France were both rated in June and have had a significant impact on the rankings of the top players according to the URS™ system. GM Wesley So has seen his ranking drop from #3 to #8 after he failed to live up to his normal high standards at these two events.

+

GM Levon Aronian and GM Alexander Grischuk saw their rankings improve however after they had strong performances in Norway and Paris respectively. They both rose by 3 positions and find themselves ranked at #5 and #6 on the URS™ system respectively.

The importance of Rapid and Blitz results for the URS™ is illustrated by the fact that GM Veselin Topalov and GM Etienne Bacrot fell by 9 and 12 places respectively after the finished in 9th and 10th place respectively in the Paris GCT tournament. Further changes can definitely be expected next month once the results of the ongoing YNM GCT tournament in Leuven are taken into account.

Mamedyarov 5th on June URS™ list after recent impressive form

GM Shakhriyar Mamedyarov has risen to 5th place on the URS™ rating list for June 2016 following his recent impressive results in the 2017 Shamkir Chess tournament as well as the Moscow Grand Prix. With a June URS™ rating of 2784, Shakhriyar now finds himself just 5 rating points short of the Top 4 who still consist of Magnus Carlsen (2851), Hikaru Nakamura (2792), Wesley So (2789) and Vladimir Kramnik (2789).

The recent changes to the URS™ rating algorithm have significantly improved the results for younger players and we are now nearing a point where we will able to open the URS Rating system to allow the direct submission of results for rating purposes. Over the course of the next few months we also intend to provide enhanced access to the underlying database so that players can access the actual game results on which their URS™ ratings are based.

+

We have been receiving a number of enquiries from all over the world and we thank everyone for their ongoing interest in the development of the URS™ rating system. We know that the system will bring major benefits to both club and national level organisers once fully functional and we greatly appreciate the feedback we are receiving which is assisting us to make ongoing enhancements both to the database and to the algorithm itself.

URS™ improvements implemented as part of May rating release

Since the URS™ was launched in early January 2017, our monthly rating lists have been scrutinized by interested members of the chess public. Some concerns were raised that junior players, playing largely in isolation from the broader rating pool, were receiving ratings that were clearly too high. Based on this feedback, our research team has examined the issue and incorporated improvements into the URS™ rating algorithm. These improvements are incorporated within the May 2017 rating lists.

+

The URS™ rating calculation uses all available game results from the past 6 years to estimate the playing strength of all members of the population in one combined calculation. Although the procedure calculates ratings for all players (even if they are not connected by any game history), the algorithm was designed to work best for players whose playing history is well-connected within the overall rating pool. This allows the system to make better inferences about relative playing strengths.

To stabilize rating estimates of all players, the initial URS™ algorithm incorporated assumptions about an average player's strength. These assumptions about "typical playing strengths" are particularly important for players without many games or without much connection to the rest of the playing pool. The algorithm did not explicitly account for the fact that the typical playing strength of junior players is very different from that of adult players. To be specific, the original version of the URS™ rating algorithm did not specifically address player age as a factor within the rating calculation.

In the revised algorithm, the typical playing strength is now assumed to be progressively lower for younger players. For example, a typical 7-year-old is now assumed to be about 500 points weaker than a typical 13-year-old, who in turn is assumed to be about 200 points weaker than a typical 19-year-old. The relationship with age flattens out for players older than about 30 years. These updated assumptions were numerically determined from past games played among players of various ages and they do not have much effect upon the ratings of senior players or well-connected junior players. The improvements have a significant impact, however, upon the ratings of many junior players and these ratings are now determined more accurately than before. Most junior players have significantly lower ratings due to the changes that have been adopted.

To enable a more complete historical comparison, we have now recalculated all monthly ratings since July 2016 using the revised algorithm and made these ratings available on the website for inspection and comment. We will also shortly be increasing the level of data that is available on each player's personal URS™ profile page, so that better comparisons can be made over time and between players.

The forthcoming Grand Chess Tour events in Paris and Leuven will feature 14 of the top 15 players on the URS™ rating list as at the end of May 2017. GM Magnus Carlsen remains more than 50 points ahead of his closest rivals but there are now seven players rated above 2780 on the URS™ scale. The top 15 players as at 1 May 2017 are as follows:

Top Open - May 2017
# Player Name Fed Rating
1Carlsen, MagnusNOR2852
2Nakamura, Hikaru USA2793
3Kramnik, Vladimir RUS2791
4So, Wesley USA2790
5Vachier-Lagrave, Maxime FRA2782
6Caruana, Fabiano USA2781
7Aronian, Levon ARM2780
8Mamedyarov, Shakhriyar AZE2777
9Grischuk, Alexander RUS2777
10Karjakin, Sergey RUS2776
11Anand, Viswanathan IND2776
12Nepomniachtchi, Ian RUS2775
13Ding, Liren CHN2771
14Giri, Anish NED2768
15Ivanchuk, Vassily UKR2758

We appreciate the ongoing feedback from chess enthusiasts worldwide, and look forward to the opportunity to make further improvements to our rating system across the remainder of the year.

April URS list sees few changes at the top

March was a very quiet month for most of the top players in the world and the April 2017 URS rating list has consequently seen very few changes amongst the top lists. Our database continues to grow however and we now have more than 250,000 players with live URS Ratings.

+

The URS™ team used the month of March to further optimise our rating algorithm and we will shortly be publishing details of some important changes that have been made as a result of the analysis that has been performed. We will continue to analyse and optimise the formula as more data becomes available throughout the year.

The changes will be applied to all future rating periods as well as to our historical ratings to ensure that they remain comparable over time. We believe that these changes represent another significant step forward and we will release full details within the next 2 weeks.

MVL 5th while Grischuk and Mamedyarov claim top 10 spots in March URS rating list

The URS™ rating list for March 2017 has been released and sees GM Maxime Vachier-Lagrave rise to number 5! This follows his tie for 1st place at the FIDE Grand-Prix tournament in Sharjah and his 4th place finish in Gibraltar.

GM Alexander Grischuk (who won the Sharjah tournament on tie-break) has also risen to number 8 on the URS™ system while GM Shakhriyar Mamedyarov finds himself at number 10 after his bronze medal finish.

+

GM Vladislav Artemiev replaced GM Wei Yi as the number 1 ranked junior player following his two good results in the Moscow Open and Moscow Blitz tournaments.

There were few movements amongst the top female players this month but this is likely to change in April once the results of the Woman’s World Cup are taken into account.

The URS™ team appreciates the feedback that has been received to date and has taken specific note of comments received with refence to the rating of inactive players and of emerging juniors. We are in the process of assessing a number of options which will help us to further optimise the URS™ rating algorithm and we expect to incorporate these adjustments in the April 2017 rating list.

Work is also progressing well to enable the submission of games directly to the URS system for rating purposes and we will make further announcements in this regard in the near future.

Wesley So rises to 4th in the February 2017 URS Rating list

The February 2017 Universal Rating List has been released and sees GM Wesley So rise to 4th place on the URS rating list after his impressive victory at the 2017 Tata Steel Masters tournament. The other major beneficiary amongst the top 15 players was GM Levon Aronian who rose from 11th place to 8th place following a solid +2 performance in the same event. GM Wei Yi also finished on +2 at the Tata Steel Masters and this result sees him replace GM Vladislav Artemiev as the number 1 ranked junior player on the list. January was a quiet month for the top ladies and saw little change amongst the top ranked female players top of the URS rating list.

+

The top 15 open and female players per the February 2017 rating list are now as follows. The change column reflects individual movements from the January 2017 URS rating list.

Commentators with a key eye will note that GM Vladimir Kramnik has lost 3 rating points on the February list despite not playing any rated games during the month of February. This is one of the key differences between the URS and the ELO rating systems. URS continuously re-rates all players in the database regardless of individual activity during the month and these movements are hence normal for the system. A more detailed explanation of some of the notable changes between the January and February rating lists will be released in due course.

The Universal Rating System – A Performance Rating Across All Time Controls

The URS™ rating algorithm was designed and developed by our research team, which consists of Mr. Maxime Rischard, Dr. J. Isaac Miller, Dr. Mark Glickman, and Mr. Jeff Sonas. The work has been funded for the last two years through a collaborative research project funded by the Grand Chess Tour, the Kasparov Chess Foundation, and the Chess Club and Scholastic Center of Saint Louis.

There are many differences between the URS™ and the FIDE Elo systems. The most striking difference is that the URS™ calculates only one rating for each player, informed by their results at all rates of play from Classical to Blitz (5 minutes per game). This published rating is the URS™ system's assessment of each player's strength at Classical chess (defined as a rate of play where each player has at least 2 hours for their first 60 moves).

Furthermore, the URS™ is a weighted performance rating, calculated across several years of previous game results for all players. Older games are given less importance than recent games, by applying an exponential decay rate. URS™ Ratings are calibrated so that they use a scale comparable to traditional Elo ratings. It is critical to note, however, that the URS™ does not incorporate Elo ratings anywhere within its actual calculation.

+

Comparison to Elo-based approach

A URS™ Rating is more like a performance rating than it is like an Elo rating.

In Elo systems, each player retains an Elo rating that is incrementally adjusted based on the results of any new games played. It is essential to know what the Elo ratings of the two players were at the time that a game between them was played, since each player’s Elo rating is used to calculate their expected score. This in turn directly impacts how their Elo rating is increased or decreased due to the actual result of each game.

In comparison, the URS™ involves computing ratings simultaneously over a substantial period. The only game information included in the URS™ calculation is who had White and Black, what the outcome was (win/loss/draw), when the game was played, and what the time controls were. It doesn't matter what each player’s rating was in the past, because there is no concept of a rating that is incrementally adjusted up or down.  

Instead of adjusting an existing rating, the URS™ simply includes the new games within the large pool of existing data that it analyses whenever it is time to calculate a rating list. The ratings of every single player in the pool are then recalculated, in what is essentially a complex performance rating calculation, using their entire pool of games within the database. In Statistics terminology, the URS™ involves a time-weighted regularized maximum likelihood calculation, an approach that has solid statistical foundations.

What is a Simultaneous Performance Rating?

Anyone who understands how performance ratings are calculated may now be wondering how it is possible to calculate a performance rating across a large pool of games while ignoring the opponents' pre-event ratings at the time of each game? The solution is to treat all the games as if they were played in one giant tournament and determine the ratings that are simultaneously most consistent with the game outcomes. This cannot be accomplished using a simple formula like the Elo updating formula. Instead, the computation needs to be iterative. This type of iterative procedure is well-established in applied mathematics. In fact, Arpad Elo suggested a particular instance of an iterative procedure decades ago when he developed his rating system. Under this scenario, we assume initially that everyone has the same rating (we'll call it "R") and we then calculate a Tournament Performance Rating (TPR) for every single player across this hypothetical tournament. Then once you have those TPR's for everyone, you recalculate a TPR for everyone, but this time, instead of using "R" as the rating for each opponent, you actually use each opponent’s latest TPR. And then you keep doing this, over and over.

In this way, each time you have an iteration of calculating a performance rating, you obtain a more self-consistent set of performance ratings, which in turn makes the next iteration of TPR even more self-consistent. If you do this for long enough, you will typically reach a stable equilibrium where the TPR's are changing by only negligible amounts from one iteration to the next. When you reach this point, each player's performance rating is consistent with the performance ratings of all their opponents. This can be called a "simultaneous performance rating".

Performance Ratings Explained

This brings us to a major point: what exactly do we mean when we say the URS™ is like a “performance rating”? In fact, there are lots of ways of calculating performance ratings. Some of the ways are simple, and some are far more complex.  

The simplest, most popular, and most easily understood method, is to first calculate the average rating of your opponents across the different games played in a tournament. You then convert your overall percentage score in those games into a rating advantage/disadvantage, and add that positive/negative number to the average rating of the opponents faced, in order to calculate the performance rating.

There are different variations of this calculation (e.g. removing the game against the weakest opponent and so forth) but they typically involve a relatively simple formula that allows you to calculate the performance rating directly. This is essentially how Jeff Sonas’s Chessmetrics simultaneous performance rating calculation worked, years ago. US Chess has a similar but more complex way of computing performance ratings that are the basis for determining provisional ratings. However, the URS™ took a more thorough approach.

There is another way of viewing performance rating, one that is not as easy to calculate using a direct formula. Under this approach, a performance rating is "the rating that would have led to the performed results". We can then consider several possible ratings for our player, assess what their overall expected score would be (with that rating) across all their games, and then pick the rating that yields an expected result that most closely matches what actually happened.

Rather than inventing a specific formula that can be used to calculate ratings directly, like there is for the Elo system and for performance ratings, the URS™ involves a probability model that analyses a large domain of possible ratings for each player, with some ratings being more likely than others (based upon the overall population distribution of chess strength). Across those possible ratings, our system then determines how likely the actual results would have been to occur, and ultimately determines the most likely overall set of ratings, for all players at once, in order to best explain the actual results.

Time weighting adjustments

In a traditional TPR, it makes sense to treat each game with equal importance, since all games are played at nearly the same time. When we extend the concept of a TPR to cover a much longer timeframe, as we have done with the URS™, then of course some of the games were played recently and others were played years ago. It is therefore logical to give the older games less weight than the newer games.

In the case of the URS™ we assign reduced importance to older games through the use of exponential-decay game weightings. The actual decay rate is one of the URS™ system’s parameters and affects how sharply or gradually the importance of older games is reduced. We currently calculate ratings across a six-year history of game results.

Rate of play adjustments

Finally, accounting for different time controls was incorporated into the URS™ in a more universal way than just classifying all time controls as either Blitz, Rapid, or Standard. For each event, we determined the maximum number of minutes each player could spend for their first 60 moves. We call this value "M60", where the "M" can be viewed as being an abbreviation for "Minutes".  

For some time controls it is easy to calculate M60. For example, for "Game in 5 minutes" or "Game in 90 minutes" the corresponding M60 values are 90 minutes and 5 minutes, respectively. This represents the maximum number of minutes each player could take for their first 60 moves (indeed for their whole game).

However, many time controls have delays or increments, where players receive additional thinking time either during, or after completing, each of their moves. In these cases, we assume the maximum time is taken, whether for increment or delay. Since we are looking for the maximum time that can be taken for 60 moves, it is convenient that increments and delays are typically expressed as N seconds per move, since this also means that it takes N minutes for the first 60 moves. Consequently, if we see something like "Game/5 min + 2 sec/move", we can just add 5+2=7, which means that our M60 value for this time control would be 7 minutes.

And finally, where there are time increments linked to a specific number of moves, then these are also counted (provided the increments start before move 60). Consequently, a time control like "40 Moves/90 min + Game/30 min + 30 sec/move" would have an M60 value of 150 minutes. This is calculated as 90 minutes for the first 40 moves, another 30 minutes (maximum) for the next 20 moves, and 30 minutes' worth of increments through to Move 60, so our value of M60 is 90+30+30=150 minutes. However, if the time control were "40 Moves/100 min + 20 Moves/50 min + Game/15 min + 30 sec/move", then the part about "Game/15 min" would not matter for the calculation of M60, since it doesn't apply until Move 60 has already been completed.

Thus you get the same value of M60=5 minutes for "Game/5 min" and for "Game/3 min + 2 sec/move", and our rating system therefore treats these time controls equivalently. Similarly, the two common time controls "Game/90 min + 30 sec/move" and "Game/120 min" are treated equivalently as M60=120 minutes.  

The URS™ uses the M60 values within continuous functions that model the variability of chess results across all rates of play from 5 minutes (blitz) up to 120+ minutes (classical). They are also used to calculate the degree to which individual players' quality and consistency of play degrades as the rate of play moves along the spectrum from classical to faster rates. Player-specific degrees of degradation in quality and consistency of play are expressed as their Rapid Gap (applying to M60=30 minutes) and their Blitz Gap (applying to M60=5 minutes) for each player. A larger Rapid/Blitz Gap means that the player’s quality and consistency of player degrades faster as they play at fast rates of play.

While optimizing the rating system, we determined the appropriate continuous functions to use for modelling the variability of results at any value of M60. Thus we treat "Game/41 min" slightly differently from "Game/42 min" and "Game/43 min", but there are no special considerations for these particular rates of play, just as there is nothing special about the treatment of "Game/9 min" versus "Game/10 min" versus "Game/11 min" in our system. They are all treated smoothly across the full spectrum of time controls. By contrast, "Game/9 min" versus "Game/10 min" versus "Game/11 min" are handled completely differently in the Elo system, as some of these games go into the Rapid rating system while others go into the Blitz rating system.

From the perspective of the URS™, the only special points on the spectrum of time controls are at M60=5 minutes and at M60=120 minutes. The URS™ currently does not rate any lightning/bullet results played at faster than "Game/5 min". And all games played at time controls of game in 120 minutes or more (i.e. all classical games) are treated as M60=120. Any slower time controls of play for games that can take longer than 2 hours for each player are thus treated equivalently to "Game/120 min" in the rating calculation.  

On the surface, it might seem like a bad idea to mix together a large number of rapid and blitz results with a relatively small number of classical results, when the ultimate goal is to calculate a rating that accurately measures classical chess skill. Indeed, the greater unpredictability of faster chess does mean there is less information to be learned from one rapid or blitz result than from one classical result.

Nevertheless, the URS™ recognizes that there is useful information about a player’s over-the-board strength in all game results regardless of the time limit, and can therefore more effectively estimate a player’s classical chess strength by also considering their results in games played at faster time controls. As the speed of play increases, the URS™ assigns less and less importance to the game results relative to games played at slower time controls. In this way, we gain useful information about players’ classical chess skill without being overwhelmed by the volatility or volume of rapid and blitz games.

We will be providing more details about the rating system in due course, but hopefully this provides an interesting and informative introduction for now.

We welcome all constructive comments that can help us improve moving forward.

Download the full press release here.

URS™: Universally Better Than Elo

We expect some people to challenge the notion that games played at slow time controls can be mixed together with faster games within a single rating system. One commonly-held (though admittedly subjective) belief is that classical chess is categorically different from rapid chess and even more different from blitz chess and the three types of chess ought to be kept separate.

There is another way to think about this, however. What if classical and rapid and blitz aren’t that different from each other? What if they all reveal information about a player’s universal chess ability, with the understanding that games become more chaotic and less informative as the rate of play speeds up?

+

If you accept this concept, then perhaps there is a way to effectively combine over-the-board games from all time controls into a single rating system, to use a single pool of data for analysis, and to create a single “universal” rating for each player. How could we tell, objectively rather than subjectively, whether this is a step in the right direction, or a step in the wrong direction?

If we believe that having three separate rating systems (and hence three separate ratings for each player) is a better approach than having one universal rating system (and one universal rating for each player), then wouldn’t that suggest that the FIDE Elo Standard ratings, calculated only from games played at slow time controls, are a purer and superior measure of playing strength at classical chess than a Universal Rating that has been tainted by games at faster time controls? Similarly, would we not expect that the FIDE Elo Rapid Ratings (calculated only from rapid games) are better at measuring players’ skill at rapid chess than that same Universal Rating which mixes the faster games with the slower games that supposedly require different skills for success? And the same for Blitz? How should we decide which ratings work better?

There are several ways to assess the accuracy of a rating system, but we propose as simple and straightforward a method as you could imagine. We asked one simple question…

"When a game ends in a decisive result (not a draw), did the higher-rated player or the lower-rated player win?"

If players’ ratings were completely random and bore absolutely no relationship to true chess strength, then exactly 50% of decisive games would be won by the higher-rated player. If, on the other hand, players’ ratings were perfectly accurate, then theoretically 100% of all decisive games would be won by the higher-rated player. While this is clearly an unattainable standard, 75% - 80% is a more reasonable goal, and we believed it was possible to design a rating system that would accurately predict the results of decisive games (discarding drawn results) at a better prediction rate than existing rating systems.

Once the models underlying the URS™ were built, we then decided to put our theory to the test. We started by retroactively calculating URS™ Ratings for the past several years on a month by month basis. This generated results which could be directly compared against the three sets of monthly FIDE Elo ratings to see which ratings (from the start of the month when the game was played) better predicted the outcome of decisive games.

We used the same set of URS™ ratings to determine the URS™ rating favorite in all games. On the other hand, we used the FIDE Standard ratings to determine the FIDE Elo rating favorite in standard games, and the FIDE Rapid ratings to determine the FIDE Elo rating favorite in rapid games, and the FIDE Blitz ratings to determine the FIDE Elo rating favorite in blitz games. Since the FIDE Rapid and Blitz rating systems only came into effect in 2012, we decided to give these ratings a one year grace period to settle, and we therefore started comparing results for all months between January 2013 and December 2016.

An illustrative example of the process that was followed is recreated below. This illustration is based on the results at the recently completed World Blitz Championships that were held in Doha from 29 – 30 December 2016.

For the sake of simplicity, we can look at just a partial cross-table which includes just the nine players who were rated 2800+ on the 1 December 2016 FIDE Blitz rating list. We would then sort these players both by their FIDE Blitz ratings and by their URS™ Ratings as of 1 December 2016. This generates the following two tables:

Comparing the tables shows clear differences. For example, GM Vladislav Artemiev was seeded ahead of GM Hikaru Nakamua based on their FIDE Elo Blitz ratings before the event but well behind Nakamura on the URS™ rating list.

Once the actual game results are available, we populate the cross-tables and compare the results. We simply ignore everything below and to the left of the diagonal line since this is a mirror image of the information in the top right. We also ignore drawn games and matchups where the players have identical ratings, since in these rare cases there are no “higher-rated” or “lower-rated” players.

This generates a table where anything shown as 1 in the area to the right and above the diagonal reflects a correct prediction, where the higher-rated player won. Anything that is a zero in this same area is a missed prediction. All of the cells we are disregarding, we have shown in gray, including the decisive results shown to the left and below the diagonal. The correct predictions (the “1” values) are shown in blue and the missed predictions (the “0” values) in red:

So when we use the FIDE Blitz Elo ratings, Magnus Carlsen’s two wins (against the lower-rated Maxime Vachier-Lagrave and Teimour Radjabov) were correct predictions while his loss to Sergey Karjakin (also lower-rated) represents a missed prediction. Overall there were four correct predictions and five misses, for an overall prediction rate (across this tiny sample of nine games) of 44%. Of particular note were Artemiev’s loss to the lower-rated Nakamura and Mamedyarov’s loss to the lower-rated Karjakin. Also note the extra “X” marks to remind us to disregard any Aronian-Nepomniachtchi and Karjakin-Radjabov results, where the players had the same FIDE ratings, or Nepomniachtchi-Vachier-Lagrave results, where the players had the same URS™ ratings.

When we do the same analysis using the URS™ ratings, the results are as follows:

From the URS™ perspective the Nakamura win over Artemiev represents a correct prediction, as does the win by Karjakin over Mamedyarov. So for this portion of the cross-table, the URS more successfully categorized the players, with a 67% prediction rate. While the dataset is clearly far too small to be drawing conclusions from, the example above should serve to illustrate how we can objectively compare the accuracy of two different rating lists that apply to the same games.

The results clearly only start having significance once we start looking at far larger data-sets. We consequently applied the same methodology to all four groups, and all players, at those recently completed World Rapid and Blitz Championships (Open Rapid, Open Blitz, Women’s Rapid and Women’s Blitz). We found that the URS™ ratings worked better than the FIDE Blitz ratings at predicting the blitz game results and also worked better than the FIDE Rapid ratings at predicting the rapid games.

Below is a high level summary of the results:

In the table above, the rightmost column has a color gradient applied so that numbers near zero are white, while more positive numbers are a deeper / darker blue, and negative numbers (had there been any) would have been red. The deeper blue colors illustrate where the superiority of the URS™ is more pronounced.

Still, that is only 1,667 decisive games. What if we cast a wider net and looked at more games? What if we looked at all blitz games, and all rapid games and all classical games, across the entire four-year period stretching from 2013 to 2016?

We did that and here are the results:

On a consistent basis, from year-to-year, and across all three rating categories, the URS™ rating engine consistently predicted the results better.

By now, you can probably see where we are going with this. Our findings indicate that that URS™ Ratings are better at identifying who is going to win a classical chess game than the FIDE Standard ratings. The (same) URS™ Ratings are better at identifying who is going to win a rapid chess game than the FIDE Rapid ratings, and the (same) URS™ Ratings are better at identifying who is going to win a blitz chess game than the FIDE Blitz ratings.

What does this say about the argument that the three types of chess should be kept in isolation within separate rating systems?

These results suggest that URS™ Ratings are, in fact, universally better than Elo ratings at identifying who is going to win a given game of chess. We would further consider this to be objective evidence in favor of the conclusion that ratings from the URS™ are more accurate across the spectrum of time controls than the Elo ratings from the separate rating lists maintained by FIDE.

From a statistical point of view, it is important to notice whether the results from 2016 were just as successful as those from 2013 - 2015. This is important, because when we optimized the inner workings of the URS™ in 2016, we adjusted a very small number of system parameters (approximately ten) to appropriate values. We did this using a statistical methodology that involved predicting the results of actual games played in the period from 2013 to 2015 and then seeing how well our rating system did at making the relevant predictions. The game result data from 2016 was only used as “out-of-sample” data, meaning that it was never run as part of any comparison exercise until we had completed our full and final rating system design. The behavior and results in 2016 can thus be viewed as being the final test. We will of course continue to monitor the behavior of the URS™ into 2017 and beyond.

The analysis above has only looked at overall numbers across the entire pool of players. However, perhaps the URS™ works well for one segment of the rating pool but not for all of it? For instance, the Elo system is known to work much better when players have a large game history, face each other often, and play more consistently. It therefore tends to function better for the top of the rating pool when compared to the entire pool.

Of course, the top of the rating pool includes only a tiny portion of the games played today. This is illustrated by the pie chart below which indicates the relative frequency of games played between players of different strengths, based on the FIDE standard rating of the lower-rated player in each game.

During the four-year period under consideration, there were barely 4,000 decisive games played where both players were rated 2600+. In fact, there were more than 600 decisive games played by lower-rated players, for every 1 decisive game played between 2600+ rated players. The slice is so small that you can barely see the blue slice marked as “a) Both players FIDE 2600+” in the upper-right of the chart.

We checked each of these ten groups of games, ranging from the elite games played among players 2600+, all the way down to games involving at least one player rated below 1400. We then compared how well the URS™ system did at predicting the winners of all the decisive games played when compared to the same players’ FIDE Standard ratings.

Regardless of whether you analyze the small slice representing the elite games, or the larger slice with the weakest players, or anywhere in between, the cells are all blue across the board. This means that at every level of player strength the URS™ better predicted the results than the applicable Elo ratings. At times, the results were only a little better, at other times they were significantly better, but they were never worse. Not in one single case.

And even though the URS™ is specifically optimized to measure a players’ strength at classical chess, it is in fact at rapid and blitz chess that the URS™ truly shows off its superiority. By including classical results within the ratings that are used to predict rapid and blitz games, we enable our rating system to make better predictions, up and down the rating list:

You may observe that even across four years of results, some of the columns are sparsely populated, having only a few thousand games. This is not actually that surprising when we consider how small the slices were for the highest-strength games, in the pie chart presented earlier in this article.

It may also prove interesting to do a more detailed check of player strength versus more specific rates of play, to see if there were any areas where in fact the FIDE Elo ratings were working better than the URS™ at predicting game results. To get sufficient data to look at this in two dimensions, we combined the strongest categories into one larger “Both rated 2000+” category so that we would have five roughly equal-sized groups of games. We could then see if there were any overall groups of players and particular time controls (or ranges of time controls) where the universal ratings were indeed inferior. The most obvious target would be the slowest time controls, for the strongest players, as that is generally the place where the Elo system works best. Games played at this level are typically less random and most players have stable strengths and face each other a lot. It was hence not surprising when it proved that this was indeed the place where the FIDE Elo ratings held up relatively best. Nevertheless, the cells remained consistently blue, with some areas deeper than others, suggesting strongly that the URS™ ratings are in fact universally superior to the FIDE Elo ratings at predicting game results:

Download the full press release here.

GCT announces 2017 wildcard selections

The Grand Chess Tour today announced their 2017 tour wildcard selections and confirmed that the following three players have been offered wildcards for the 2017 GCT Tour:

  1. GM Ian Nepomniachtchi (RUS)
  2. GM Sergey Karjakin, (RUS)
  3. GM Viswanathan Anand (IND)
+

GM Ian Nepomniachtchi earns his place due to his consistency across all time formats which sees him placed 5th on the URS™ rating list as at 1 January 2017. This earned him selection as the highest ranked player on the URS™ not already picked.

GM Sergei Karjakin was selected as the second highest ranked player on the URS™ not already picked after a year that saw him compete in the 2016 World Championship match and secure the title of World Blitz Champion.

The final wildcard was awarded to former World Champion Viswanathan Anand who is ranked 10th on the URS™ rating list as at 1 January 2017. He also tied for 4th place in the 2016 GCT tour despite only competing in three of the four events in 2016.

GM Levon Aronian was selected as the first alternate and will be invited to join the 2017 tour as a full tour member if any player declines to participate for any reason.

The three wildcard selections join the 6 automatic qualifiers who secured their spots based on their 2016 GCT results or through their average FIDE classical ratings over the course of the 2016 calendar year. The six automatic qualifiers for the 2017 GCT Tour are:

  1. GM Wesley So (USA) – Winner, 2016 GCT
  2. GM Hikaru Nakamura (USA) – Runner-Up, 2016 GCT
  3. GM Fabiano Caruana (USA) – 3rd place, 2016 GCT
  4. GM Magnus Carlsen (NOR) – 1st place, 2016 FIDE Average Rating
  5. GM Vladimir Kramnik (RUS) – 2nd place, 2016 FIDE Average Rating
  6. GM Maxime Vachier-Lagrave (FRA) – 3rd place, 2016 FIDE Average Rating

There will be fourteen event wildcards in the 2017 Grand Chess Tour with four in each Blitz & Rapid and one each in the Sinquefield Cup and London Chess Classic. The recipients will be announced in due course.

Download the full press release here.