Federer vs Nadal vs Djokovic
Over the past few years I have repeatedly had the "Who is the tennis GOAT?" debate with a friend of mine, Joe. A lifelong tennis fan, Joe preferes Federer while I, a relatively new convert to the sport, have been most impressed by Djokovic. With little tennis experience at my disposal I don't have a lot of evidence to support either player. However, thanks to Jeff Sackmann, who maintains a publicly available tennis database, what I do have is the result of every ATP Tour match since 1968. So here goes my attempt to leave bias at the door and answer this question once and for all with my good friend (better than Joe), statistics.
However, there is some recency bias in this method because the ranking points system changed in 2009 introducing more points into the system.
Their new ELO is then calculated by multiplying the difference between the actual score (1 for win, 0 for loss) and the expected score by a K coefficent and adding to the old ELO.
Once I had an ELO rating for each player that updated after every game I could run the same tournament difficulty analysis as for the ranking points. This time all 3 players faced an approximately average level of opponent on their run to Grand Slam victories.
Stuff we already know
Engaging in this debate generally starts with a list of the grand slams each player has won.
This isn't enough to make a comparison on relative abilities though because these impressive tallies were racked up across different time periods, against different crops of players and on different surfaces.
Nadal's dominance on clay and Djokovic's streak in Australia are well documented. We also see why Federer is a fan favourite at Wimbledon with 8 wins, but Djokovic is only one behind. I am sure the friendly patrons at Wimbledon would be only delighted to see history made if Djokovic were to surpass Federer's record.
We also see that being 5-6 years older than the other two, Federer raced into an early lead in the grand slam race winning most of his by 2012 before being caught in 2020.
Head to head performance
Nadal tops the H2H table between the big 3 at Grand Slams due to his near invincibility on clay. Federer has the worst record with less than 50% win rate against both Djokovic and Nadal. He would have been better off suggesting a coin toss rather than playing against either of his big three opponents in a Grand Slam.
The interesting part
The issue with grand slam and head to head statistics is that they are point in time measurements. Comparing achievements across time is one of sport's most notoriously difficult problems as so many variables change. Jordan vs LeBron, Maradona vs Messi, Schumacher vs Hamilton, this unquantifiable question exists across nearly every sport. However, as tennis is an individual head to head sport it lends itself well to the calculation of metrics that can track relative ability over time.
ATP Ranking Points
The ATP have a points system that is used to rank current players which has been in place since 1973. You can read all about it here. A player's performance at a tournament earns them a certain amount of points with winning obviously earning the most and Grand Slams having the highest weighting. Points earned expire after 52 weeks so the amount of points a player has is a rolling weighted average of his ability over the past year.
By comparing the ranking points of the opponents a player faced on the way to a Grand Slam victory we can get an idea about how difficult his route was.
- Sum ranking points of the losers each year
- Take the average of this sum across all years. This is the average ranking points overcome by a Grand Slam victor.
- Divide the sum each year by the overall average.
- A result of 1.0 means the route was average difficulty compared to the time period studied.
- A result greater than 1 means it was more difficult and less than 1 means less difficult.
If we average the ranking points from 2002 onwards (to get all of the big 3 wins) Then Djokovic had the most difficult routes to overcome and Federer had the easiest.
However, there is some recency bias in this method because the ranking points system changed in 2009 introducing more points into the system.
Following the same methodology from 2009 to 2023 yields a similar result but now all players have a difficulty slightly below average. This is because the result is skewed by players like Stan Wawrinka and Juan Martin Del Potro who had to beat at least one of the big 3, who have the most ranking points, to win.
ELO Rating
Jeff Sackmann maintains an ELO rating for all tennis players which everyone should check out. The ELO rating system, best known for its use in chess, is a dynamic rating system that updates a player's score after every match based on their performance relative to the rating of their opponent.
Before a game an "expected points" for each player are calculated based on the difference between their ELO and their opponent's. For a player A:
Their new ELO is then calculated by multiplying the difference between the actual score (1 for win, 0 for loss) and the expected score by a K coefficent and adding to the old ELO.
K has an exponential decay which means it is high when a player has played few games and lower for players with a large bank of games in the database. If a player has only just joined the ATP tour we will be less certain of his ability relative to his peers and so should be willing to update his score drastically after each game. As we see more games we have a more confident reading on a players ability and can reduce how much to update his score by. For example, Djokovic's loss to Alexei Popyrin in the 2024 US Open was a shock, but we shouldn't dock too many ELO points as we do not realistically think he is now only a top 25 player.
Algorithm edge cases
There were a few teaks to make to the model.
- For Grand Slams I multiplied K by 1.1 to reflect the fact they hold greater importance in the tennis world
- When a player takes a break from the sport due to injury it does not affect their ELO. However, players rarely come back improved from a break and it generally harms their performance in the short run. This particulalry impacts high rated players as their ELO became very slow to decay even in the latter years of their careers as they took many breaks and did not have the same prowess. Based on Sackmann's analysis I docked 100 points for an 8 week absence and 150 for any absence longer than 5 months.
This is actually different to Jeff's findings in 2022 who found Djokovic to have the hardes run and Federer to have the easiest. Admittedly he has spent many years longer than me on this. Off the top of my head I could do a lot more model validation such as:
- Surface specific ELO
- Statistical testing of model, particularly optimal K coefficient
- Optimally accounting for breaks in player's careers (how many ELO points it costs)
But my motivation for spending time on this project is dwindling so I think I will move on.
Who had the highest peak?
The highest ELO in my system, 2410, was reached by Djokovic in February 2016 after defeating Malek Jaziri in R16 of the Dubai Tennis Championship.
At the end of the day it's the end of the day
Statistically, based on titles won, the highest ELO measured, and performance across surfaces, Djokovic is the most succesful player of all time. This much is indisputable. But being the greatest is a superlative that is judged as much by sentiment as numbers. How many ELO points is nice hair and a graceful backhand worth? Or total dominance on one surface?
I could end by going full Ted Lasso on how sport is not about results but how the journey makes us feel. But I won't.
Djokovic is the greatest of all time.
Sorry Joe.
Comments
Post a Comment