Home » How A YouTuber Spent Years Teaching AI How To Beat Him At Virtual Racing

How A YouTuber Spent Years Teaching AI How To Beat Him At Virtual Racing

Training An Unbeatable Ai In Trackmania 8 22 Screenshot (1)
ADVERTISEMENT

Trackmania: Nations Forever is a weird kind of racing game. It’s a time trial, arcade-style game where players compete solely against the clock in a stadium setting. A YouTuber by the name of Yosh has been a long time fan of the game, honing his skills for years in pursuit of ever faster times. He then set out on a quest to see if he could train an AI capable of beating him at his own favorite game. It wasn’t an easy road, but in the battle between human and machine, Yosh would eventually come off the lesser.

Yosh set out to train an AI to play the game using a neural network. As he explains in his YouTube video, a neural network takes in numerical input from the game in the form of the car’s speed, acceleration, distance to the track walls, and so on. It then passes these numbers through a network of artificial “neurons” which effectively peform series of calculations to generate acceleration and steering outputs for the AI’s vehicle.

Vidframe Min Top
Vidframe Min Bottom

It might sound difficult to combine all these inputs from the car and mathematically turn them into steering and acceleration outputs. You’d be correct in that assessment. Figuring out the maths for all those neurons is too hard. Here’s the trick – Yosh didn’t have to program the neural network to do the maths directly. Instead, using a technique called reinforcement learning, he was able to train the neural network to figure out the maths for itself. With enough training, it would figure out how to make the right decisions to drive the car well.

 

ADVERTISEMENT

Reinforcement learning is a straightforward concept. It’s similar to how you might train a pet to stop peeing in the house, by offering a reward for good behavior. The AI is instructed to maximize “reward,” and is sent out to drive its car on the track with no prior knowledge. When it does good things, like keeping its speed up or completing more of the course, it gets reward points. The mathematical connections between neurons that generated this behavior are then strengthened to encourage it in future. For example, if the AI finds out it gets rewarded for accelerating flat out on straights, it will modify its neural network to reinforce that behavior. Thus, when it senses a straight, it will generate the relevant accelerator output.

Yosh was able to get the AI driving the track with this methodology, but eventually hit roadblocks. His AI drivers kept hitting the walls, costing them time, and more training wasn’t weeding out the problem. After a great deal of tinkering with learning algorithms, reward weights, and the like, he was able to eventually help his AI move past this roadblock, and it began getting faster and faster. On the simple curved twisting track he’d selected, the AI eventually beat his time. Getting to this point took him 3 years.

Training An Unbeatable Ai In Trackmania 1 22 Screenshot
An illustration of the neural network that drives the car.

He quickly realized that he would never beat the AI on this track, by virtue of the AI’s uncanny consistency. Human players make mistakes, whereas the AI tends to operate consistently according to the instructions of its neural network.  The AI was able to deftly carve the corners with the narrowest margin, something a human player would struggle to accomplish over a whole run, let alone repeat on demand.

However, Yosh hasn’t built an all-dominating AI that can beat every human at Trackmania ever. That’s because of a concept known as generalization. It’s relevant to everything from large language models like ChatGPT, to self-driving cars in development by major automakers. It’s entirely one thing to train an AI to drive on a winding, curvy course. Put that same AI in a completely different environment, though, without any further training, and it may not be able to handle the difference very well at all. This is because, using this project as an example, the lessons learned on one track don’t necessarily transfer to others.
Training An Unbeatable Ai In Trackmania 6 54 Screenshot Training An Unbeatable Ai In Trackmania 8 58 Screenshot

Training An Unbeatable Ai In Trackmania 9 2 Screenshot
The AI’s early runs are shown in red, when it has no idea how to drive the track. Its actions are near-random. As it learns how to interpret signals from the environment and respond appropriately, it drives better, as seen in the later yellow and green runs.

Yosh demonstrates this lesson by explaining his efforts to teach the AI to handle a more challenging course with harsh right-angle turns and drop-offs on the side of the track. He had to give the AI more sensory help so that it had a good idea of the track ahead. Without appropriate input from the environment, it’s impossible for the neural network to make good decisions, after all. He also had to give the AI information about the car’s pitch, roll, and wheel contact. This allowed the AI to understand when the car was tipping over the edge of the track, for example, and work to avoid driving off the edges.

ADVERTISEMENT

After 35 hours of training, the AI was able to beat its human creator in the more complex realm, as well. Yosh had left out brake use from the AI’s abilities during training, and so he was eventually able to set a faster time himself by using the brakes. However, the AI would overcome this after around 100 hours of training, handicap not withstanding. Eventually, he went on to teach the AI to use the brake, and even drift using a special in-game trick and some heavy reinforcement. It’s hilarious to watch the AI snaking back and forth down the straights as it eagerly aims to maximize reward.

Training An Unbeatable Ai In Trackmania 16 29 Screenshot
You can’t beat the joy of teaching your robot child to drift for the first time.

It’s amazing to see the AI racer develop over time. It starts out as a bumbling fool of a thing that can’t get round the first bend. By the end of the training, it’s carving perfect drifts on a tight course at speeds the average human player could never hope to match. Yosh has done a great job, not just in training his AI better, but in explaining these complex machine learning topics to a broader population. That, my friends, is worthy of applause.

Share on facebook
Facebook
Share on whatsapp
WhatsApp
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on reddit
Reddit
Subscribe
Notify of
18 Comments
Inline Feedbacks
View all comments
Harvey Park Bench
Harvey Park Bench
1 year ago

That’s a very good and approachable explanation of how this kind of ML works!

Jack Langelaan
Jack Langelaan
1 year ago

It is really hard to come up with a good reward function and to define good constraints. AI and machine learning researchers have been discovering some of the things that optimization researchers have known since the 1980s… your learning (or optimization) algorithm will: (1) exploit quirks and not-well-modelled parts of your mathematical models; (2) exploit your reward function; (3) exploit your constraints; (4) find a local optimum but not a global optimum.

You’ll get solutions that are optimal in the context of your dynamic model but in real life have really strange behaviors (even though they are physically possible). Or solutions that are physically impossible because of unmodelled dynamics. Or solutions that exploit a local behavior to crank up the reward.

A really good description of this is here: https://www.alexirpan.com/2018/02/14/rl-hard.html

My current ML/RL “favorite” is researchers using machine learning to develop control systems for things that are really well understood using physics based methods (meaning, the machine learning is being used to learn Newton’s Law). In most cases the “right” answer will likely be to use physics based models together with a machine learning model… the physics based model captures all the stuff that you can describe using differential equations and the ML part models the things that are really hard to describe that way.

Mantis Toboggan, MD
Mantis Toboggan, MD
1 year ago

There’s got to be a way to run a program like this in a real vehicle. Something with full motion sensors and fast response electronic or hydraulic controls. Train it virtually first so it doesn’t smash up the doubtless expensive RC vehicle and then see what it can do in the real world. On a well maintained track it might not have issues but potholes, dips, humps, cracks and loose debris could give it a hard time. Plus rain or changes in light if it uses a camera based system.

What I’d really be interested to see is how they would learn to behave racing other cars. Could it be taught to bluff and intimidate others, to block cars behind it while threading it’s way through the ones ahead? It would be game theory, a constant evaluation of risk versus reward with the consequences ranging from first place to DNF. I know for a lot of people the draw of racing is apparently the driver. I don’t see how you get more personality from a human wrapped in logos speaking pure Corporatese than a computer but this would not be for them. More of a BattleBots, machine to machine competition. I’d like it and I think there are probably more people who would enjoy AI racing.

Max Headbolts
Max Headbolts
1 year ago

This exists!
https://www.donkeycar.com/

I built one, but kept breaking cheap R/C parts and then the pandemic supply chain priced my out of ever actually getting to compete.

Harvey Park Bench
Harvey Park Bench
1 year ago
Reply to  Max Headbolts

It is also known as Tesla Full Self-Driving

Ben
Ben
1 year ago

The question I have with any of these machine learning systems is whether they’re actually better than a purpose-built simulation system. I don’t want to see this AI compete against humans, I want to see it compete against the normal AI in a racing game that was written with domain-specific logic. Is this faster or slower? Was it more or less time-consuming to write? These are the questions we should be asking around machine learning.

My guess is that the answer in some cases will be that it’s better and some cases it’s worse. I wouldn’t be shocked if it’s actually worse in a closed, relatively simple system like a racing game, but in a massively complex system with more inputs than you could possibly code for manually it might be better, or indeed the only way to do it.

Stef Schrader
Stef Schrader
1 year ago

Heh, those early-in-training cars remind me of when the Lemons iRacing series slaps my name on the bot-cars. There isn’t a skill setting low enough to replicate my performance in a video game.

Mark Tucker
Mark Tucker
1 year ago

He should try to get a sponsorship from Falken.

10001010
10001010
1 year ago
Reply to  Mark Tucker

Professor Falken mostly focuses on AI that plays tic-tac-toe, chess, and Global Thermonuclear War.

TOSSABL
TOSSABL
1 year ago

That’s really cool. Gotta love that the ai started drifting back & forth on the straights when Yosh set a reward of 10 attached to drifting. Way too easy to anthropomorphize it : ‘Yeah! Ai likes to hoon, too!’

I just wonder how setting up demerits or ‘punishments’ for unwanted behavior would affect it. If it was punished for driving into/off the wall, would it learn the better route more quickly?

10001010
10001010
1 year ago

Now if only someone could train AI to teach my dryer how to fold the laundry.

MATTinMKE
MATTinMKE
1 year ago
Reply to  10001010

And put it away.

Jack Langelaan
Jack Langelaan
1 year ago
Reply to  10001010

Turns out robotics researchers are working on this… it’s amazingly hard to fold laundry (really complicated dynamics). https://www.npr.org/2022/10/22/1130552239/robot-folding-laundry

10001010
10001010
1 year ago
Reply to  Jack Langelaan

Yeah I’ve seen those reports but I think they’re just not trying hard enough. If half the effort being put into self driving cars by several companies could be applied to laundry I think we’d have it licked by now.

I know that I personally would MUCH rather have a laundry robot than a semi-self-driving car that can’t tell the difference between a stop sign and a full moon.

Jack Langelaan
Jack Langelaan
1 year ago
Reply to  10001010

And if we spent half the money spent on self-driving cars on high speed rail we’d have…

half a mile of track somewhere halfway between LA and San Francisco.

unfortunately. It’d be great to have a high speed rail link (or any kind of reliable rail link) between Pittsburgh and Manhattan.

10001010
10001010
1 year ago
Reply to  Jack Langelaan

I might have agreed with you a few years ago but Sammy’s Famous Corned Beef closed down and there’s simply no reason for anyone in Manhattan to visit Pittsburgh anymore.

18
0
Would love your thoughts, please comment.x
()
x