Formula 1 and Machine Learning
The use, accuracy, and pitfalls of machine learning in Formula 1 racing.
The use, accuracy, and pitfalls of machine learning in Formula 1 racing.
The movie, Moneyball, can be considered as a prime example of data-driven performance optimization in sports. For those who haven’t watched the movie or read the book it is based on, it depicts the story of how the Oakland Athletics’ general manager, Billy Beane, used statistical data and analytics to build a competitive team despite the team’s small budget.
There aren’t many things in the universe that can’t be predicted. The world of sports is rich in quantifiable features, making it ideal for the application of artificial intelligence. Artificial intelligence systems in sports have become commonplace in recent years. Given the positive influence they’ve had as their talents have grown, they’ll continue to break down barriers in the world of sport. Be sure to check out our post on VAR in football here.
The core of machine learning is the amount and quality of data. F1 cars have an ECU, which is essentially a small but very powerful computer that controls, processes and transmits vast quantities of data from the F1 cars to the teams.
The ECU provides a control for a variety of systems including the engine, gearbox, differential, throttle, clutch, energy recovery system (ERS), and the drag reduction system (DRS). It is also the primary data logging service that feeds live data – via telemetry – to the teams and race control. This allows teams to visualize the capability and performance of their cars in real-time, including engine health, tire degradation, and fuel consumption.
With help from over 300 sensors on each car, the ECU deals with over 1500 input parameters and transmits more than 3GB of live data back to the team during an average 300km Grand Prix. During a two-hour race, the ECU will receive and send over 750 million data points.
Formula 1’s data scientists are using Amazon SageMaker to train deep-learning models on 65 years of historical race data to extract critical race results statistics, make forecasts, and provide fans with insight into the split-second decisions and strategies used by teams and drivers.
Succeeding in Formula 1 is now all about the cycle of racing, calculating, analysing, designing, and then repeating this process, according to Zoe Chilton, head of Technical Partnerships at Aston Martin Red Bull Racing, with the team making about 1000 new prototypes between each race on the calendar, or 30 000 for the season.
The weather is arguably the most volatile aspect of a race. Even if teams have access to live weather reports, it is difficult to determine precisely what will happen. At the 2020 Hungarian Grand Prix it was expected to rain, but it did not, affecting tire and pit stop plans for each driver.
The consistency of the predictions is often called into question when discussing problems such as:
Deep learning can be used to predict when mechanical failures will occur to solve this problem. But how accurate is it? Pit stops often take 20 – 25 seconds, which means that a wrongly timed/misjudged stop may cost the driver a podium and valuable championship points; the accuracy must be as accurate as possible.
Since races are continually being reintroduced/removed from the schedule each year, data for a given Grand Prix will not always be applicable. The Dutch Grand Prix at Zandvoort is returning after 35 years — this data would be significantly out of date, particularly because the track is being reconstructed and the cars have changed drastically since.
Changes to tracks between seasons don’t help either, since the track will be “different” for even the slightest changes, particularly if the track distance was affected. Teams will need to build a model of each circuit that integrates the addition/removal of different elements, as well as an algorithm that estimates an average lap time.
Some fans are using machine learning to make their own predictions, and others are building visual dashboards to see which factors are more likely to influence the results themselves. Now that the significance of qualifying positions has been identified, the likelihood of winning depending on starting place must be examined, given that all other factors are equal — i.e. the driver qualifying first incurs no grid penalties.
The Baku circuit in Azerbaijan is the least predictable, but considering the race’s short history and outcomes, this is not surprising. The driver in pole position has only won once. In 2017, the winner started from the 10th row, demonstrating how chaotic F1 races are, offering the possibility of winning from outside the front row. Clearly, this shows the importance of the circuit in reliably predicting the results of a competition, when not all circuits are as straightforward to forecast as some.
Overall, the large quantities of data available to teams enable them to study different facets of a race separately, but the difficulty of the variables that make up a race means that, for the time being, using machine learning methods to predict race strategy is inaccurate. Despite these issues with the results, there is no doubt that ML and AI will soon overtake the sport — the only concern is how long it will take and how accurate it will be.
Stay up to date with the latest AI news, strategies, and insights sent straight to your inbox!