PPO-algo-with-custom-Unity-environment Agent not training well

Similar to this work I was creating my Unity platform and training using python API. Somehow, my agents are not training well so I turned up here. I tried to run your code but I am getting a mean score of 0.001 every time. Did it happen to you?

Dec 01 '20 15:12 surajitsaikia27

Can you tell me which TensorFlow version you are using? I think this is happening because calculated losses are not updating the Actor-network. I am also facing the same issue using TensorFlow 2.x versions. But I encountered one solution i.e, manually updating weights of the network with GradientTape in TensorFlow.

Dec 01 '20 15:12 dhyeythumar

I am using Tensorflow 2.3. So after updating the weights using GradientTape your agent is working well?

Dec 01 '20 16:12 surajitsaikia27

I haven't applied, because I didn't found any concrete solution with GradientTape with respect to the implementation of the PPO algorithm. So currently I am using ML-Agents to training the environment but I am thinking to update the code.

Dec 01 '20 16:12 dhyeythumar

Another enhancement required is to handle multiple agents in a single environment and training parallel environments.

Dec 01 '20 16:12 dhyeythumar

That would be great. I am able to train agents in open ai gym using PPO without issues, but I don't know why the same code is not working in Unity.

Dec 01 '20 18:12 surajitsaikia27

Great to hear that it's working with gym environments. Can you tell me more about the Unity environment? For example, Is it multi-agent? or are you trying to train multiple copies of the environment?

Dec 02 '20 03:12 dhyeythumar

it is the reacher environment in Unity 3D. It has a multi-agents. You can try it out :)

Dec 02 '20 10:12 surajitsaikia27

Currently, this repo doesn't support the multi-agent environment. So I think this might be an issue. I will create a TODO section in README mentioning all the enhancements required for this repo.

Dec 02 '20 10:12 dhyeythumar

PPO-algo-with-custom-Unity-environment PPO-algo-with-custom-Unity-environment copied to clipboard

Agent not training well

PPO-algo-with-custom-Unity-environment
PPO-algo-with-custom-Unity-environment copied to clipboard