Reinforcement learning - JUNE 29TH 2024
One of the most important parts of reinforcement learning is the reward function, which is pretty obvious, how else would we feed the lil robot treats for winning?
Anyway, in working with css AI I realized 2 things,
1) i'm an idiot
2) the gymnasium library really makes RL look easy!!
I mean, if the game already had a wrapper for it, i'd be living wonderfully, unfortunately, it doesn’t!
How do you even define a reward for css surf?? Obviously we want to go fast, and go fast in the forward direction, but what about when we turn to go sideways? And after all of that, how do we let the agent know what path ISN’T a drop to its death without just giving it the path to go?
While an experienced engineer would probably recognize all of these issues quickly and work to fix them, I'm an idiot!
I began by simply feeding the position and angle of the player into the model, while this worked kind of when my reward function only wanted it to move fast, I recognized as i moved on that I should probably
1) read info directly from memory, instead of from a console log
2) Give it eyes!
I mean what kind of ‘observation’ is it if the poor guy can’t even see??
So i need to embed both transform data and a screenshot at each observation in one tensor, then act on that, probably easy to do right?
It was pretty easy to add this, just needed to move some stuff around, now the model consists of conv2d layers for processing the image, which is just a 512x512 screenshot in greyscale, and some linears for the positional data, which is just 6 items (xyz pos/angle).
This, along with a revamped reward function seemed to improve the model, and the time-to-first-surf was lowered. I also reworked how I grabbed the positional data, and made a sourcemod plugin to allow me to grab all the info at any moment using rcon, which proved to be very reliable.
Another thing is that i'm using an epsilon-greedy agent to train this model, but perhaps some other agent architecture would be more suitable, I’ll look into it if i return to this project.
Things i learnt from this:
RL is more about determining a good reward function than anything
It can be slow and painful
Starting the model with some supervised learning is FINE, especially when you need to teach it some crucial mechanics. Obviously for a situation like this,it's not as critical, however if we look at something like a self-driving car, you wouldn’t want to risk this kind of stuff.
This was a fun experiment, and I will probably return to it one day, however I think my current perspective isn’t correct to solve this problem, which is fine.