neat-python
neat-python copied to clipboard
Unable to train a car to traverse a track
Car specs - I've modeled the car using unity and interfaced it with python using Unity ML Agents(https://github.com/Unity-Technologies/ml-agents). The car has 5 radar sensors mounted on the front bumper each of which calculates distance to nearest obstacle. The car has 2 controls, Accelerate (with range [-1, 1]), and steer (with range [-1, 1]). I've set the configuration file for 5 inputs and 2[EDIT] outputs and set the activation to tanh.
Fitness function - The fitness for a genome is calculated by running the car in the track and recording a distance metric based on acceleration values. (More detail about how fitness is calculated at the end)
Problem - During training what happens is, the car learns not to go backward, but does not learn to avoid obstacle or traverse the path. The fitness values of all genomes is always from a fixed set of float values. After enough iterations all the genomes tend to have the same fitness values and it goes on without stopping.
I couldn't get what the problem is, is it a problem with modelling of the car or is it that the configuration parameters are sub standard.
How fitness is calculated detailed - Consider 5 time steps of environment with acceleration values at each as [a1, a2, a3, a4, a5], for all -1 <= ai <= 1, and initial velocity, v0 = 0. First time step - Distance covered in first time step s1 = v0 * t + (1/2) * a1 * t ^2 = a1 / 2 [t = 1 time step] Velocity at the end of first time step v1 = v0 + a1 * t = a1 * 1 = a1
Distance covered in second time step s2 = v1 * t + (1/2) * a2 * t ^2 = v1 + a2 / 2 = a1 + a2/2 Velocity at the end of second time step v2 = v1 + a2 * t = v1 + a2 * 1 = v1 + a2 = a1 + a2
Distance covered in third time step s3 = v2 * t + (1/2) * a3 * t ^2 = (a1 + a2) * t + (1/2) * a3 * t^2 = (a1 + a2) + (1/2) * a3 = a1 + a2 + a3/2 Velocity at the end of third time step v3 = v2 + a3 * t = a1 + a2 + a3 ... so on for all time steps Used the following equations of motion s = ut + (1/2)at^2 v = u + at
It looks like your issue is with fitness function. I would try something simpler, like max distance traveled per each generation.
@puneets2811 , I am also interested to hear if the change in fitness function impacts the long-term growth of your cars solving the track. Keep us in the loop!
@abrahamrhoffman I'd like to know what kind of change in the fitness function are you mentioning.
The only fitness function that I've used calculates a metric directly proportional to actual distance traveled by the car before it crashes. I'd like to mention that the car only learns that it doesn't have to go backward but it does not learn to steer.
I've tried tanh
and clamped
(between -1 & 1) activation functions here. The clamped one performs totally random actions.
@evolvingfridge I couldn't completely get how max distance traveled per each generation can be modeled. The better genomes propagate in the next generations automatically.
@puneets2811 Each genome (Car) before crash will travel N meters forward (+) or backward (-), genomes with highest positive distance traveled are your best genomes that are selected for reproduction from current population. Also double check your sensors and steering code that there is no bugs (happened in past with me not once) :smile: