Ultimate Tennis Statistics aims to become the ultimate tennis statistics destination for the die-hard tennis fans. It tries to provide all kind of tennis statistics in the Open Era male tennis with simple yet effective web GUI. If you have any suggestions for additional features or tweaks to the current features, please e-mail me at firstname.lastname@example.org or log a GitHub issue.
Data on which the statistics is based is from open source tennis data repository by Jeff Sackmann, with some corrections and additions where data is wrong or lacking.
Even with these corrections and additions, there are still small errors and data missing.
Most notably for many tournaments between 1968 and 1972, as well as full rankings between 1981 and 1983.
Rankings before official ATP rankings started in 1973 season are estimated and as well still not complete.
Please provide feedback on data as well at email@example.com or GitHub.
A lot of content on this site is based on 'GOAT' formula, which is a formula to quantify tennis player achievements throughout their careers and to compare players from different eras. 'GOAT' formula is based on assigning 'GOAT' points to players for tournament results, ATP and Elo rankings and various important achievements. For visual description of the 'GOAT' formula please click:
Tennis Crystal Ball and Ultimate Tennis Statistics source code is licensed under Apache 2.0 License.
'GOAT' Formula, customizations of Elo Ratings for tennis, Match Prediction, Tournament Simulation and other algorithms by Ultimate Tennis Statistics are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
In short: Attribution is required. Non-commercial use only.
If you like this website and want to support it, please consider a small donation to support the project.
All donations will be used only for paying the web hosting bill.
About Tournament Simulation
Tournament Simulation is driven by individual Match Prediction. In each round, probabilities for each match in the draw are calculated using Neural Network Match Prediction Algorithm.
Based on this probabilities, chances for probable matchups in the further tournament rounds are calculated. Finally, probability of the player to win the title is calculated as a multiplier of probabilities to win in the each or the rounds.
If the round is far, like semi-final or final, there are many potential opponents and probabilities for a player to win over all of them are calculated.
For example, probability to win the title depends on the probability of the player to reach the final as well as probabilities of all players in the other half of the draw to reach the final, multiplied by probabilities for player to win the final match over the each of them.
Tracking Tournament Progress
As tournament progresses, outcome of some matches gets known, thus the match probabilities are set to 100% and 0% for the winner and for the loser respectively.
Elo Ratings are recalculated after the each round and Elo rating points earned/lost by wins/loss in all the previous rounds (including current round if match is finished) are presented in brackets.
Sometimes, as initial tournament draws are out, they include unknown qualifiers. Probability for the player to win over the unknown qualifier is determined by variation of the Match Prediction algorithm that includes average Elo Rating and ATP ranking points of the qualifiers as well as winning percentage vs qualifiers, overall and by surface, level, etc...
Lets name match probability that player A wins over player B as PmA vs B.
These probabilities determine the probability for each player to pass to the second round PrA(R2) = PmA vs B.
Probability of the player A to reach the next round R+1 is calculated this way:
PrA(R+1) = PrA(R) * ΣN=1-n ( PrN(R) * PmA vs N )
This means that probability for player A to reach the next round R+1 depend on probability for player A to reach the previous round R multiplied by the weighted sum of probabilities for player A to win over his potential opponents in the next round. Weights of the potential opponents are the probabilities of each opponent to reach the round R.
About Match Prediction Algorithm
Match Prediction is based on players' previous results and track records.
Previous results are analyzed by the Neural Network Algorithm with ~60 neurons for different features about players like Elo Rating, Surface Elo Rating, ATP Points, Recent Form, Head-to-Head ratios and Winning Percentages varied by surface, tournament level, tournament, round, recency, match or set ratios, vs rank, vs hand, vs backhand...
Match win probabilities given by each of the features (neurons) are then combined by the neural network using different weights.
Training and Tuning
Neural Network is trained on the historical data for the highest prediction rates and to determine optimal feature weights.
In order to further increase prediction accuracy, Neural Network is trained specifically for different surfaces, resulting in different feature weights per surface.
During training, some neurons are determined to be useless and they are removed from the network, thus about ~40 neurons remain.
Primary and Secondary Probability Contributors
Elo Ratings, overall and by surface, are the primary contributors to the match prediction, following by recent form, winning percentages and H2H percentages.
Elo Rating neurons individually give high prediction rates, but when they are combined with recent form, H2H and various winning percentages, the prediction accuracy is even further increased.
However, importance of the secondary contributors is very surface dependent, so for example on clay and grass, recent form is pretty much irrelevant, because momentum of form is often disturbed by the surface adaptation and because of relatively short lengths of the clay and grass seasons.
While winning percentages and H2H are more or less equally important on clay, H2H is mostly irrelevant on grass (H2H patterns that are observable on hard and clay do not stand out on grass).
On hard court, recent form is pretty much relevant (due to the length of the hard-court season), alongside the winning percentages and H2H.