baselines icon indicating copy to clipboard operation
baselines copied to clipboard

Driving arround objekts with PPO

Open DavidS32 opened this issue 5 years ago • 0 comments

Hi there,

I am working on driving industrial robots with neuronal nets and so far its working well. I use the PPO algorithm from the OpenAI baseline and so far I can drive easily from point to point by using the following rewarding strategy:

I calculate the normalized distance between the target and the position. Then I calculate the distance reward with.

rd = 1-(d/dmax)^a

For each time step I give the agent a penalty calculated by.

yt = 1-(t/tmax)*b

a and b are hyperparameters to tune. As I said this works really well if I want to drive from point to point. But what if I want to drive around something? For my work I need to avoid collisions and therefore the agent needs to drive around objects. If the object is not straight in the way of the nearest path its working ok. Then the robot can adapt and drives around it. But it gets more and more difficult to impossible to drive around objects which are straight in the way. At the moment the agent gets a slight penalty for hitting something and the episode gets terminated.

I already read a paper which combines PPO with NES to create some Gaussian noise for the parameters of the neural network but i cant implement it by myself. Does anyone have some experience with adding more exploration to the PPO algorithm? Or does anyone have some general ideas how I can improve my rewarding strategy to get the job done?

Greetings, David

DavidS32 avatar Jul 23 '20 06:07 DavidS32