ml-agents
ml-agents copied to clipboard
Is there a way to disable "Asynchronous Environments"?
In ML Agents V0.9 the Unity env seems to have switched to an async env set up as described by this image:
This works well except in cases where the Unity Env (with multiple agents in the scene) is being used to with a learning framework that expects a constant number of observations in each step such as stable-baselines.
As discussed in this issue the fact that each env.step() might return a different number of terminal + decision steps makes working with such frameworks very difficult.
I was wondering if there is a proper way to force UnityEnv to sync the steps back as it used to.
I am fully aware that this is not a functionality UnityMLAgents claims to support, I am working with my own (multi agent) UnityToGymWrapper but I still wanted to ask and get some advice as to how I can solve this problem.
Thank you very much!
I think there is some confusion with what we mean by environment. In the Blog post you shared, environment refers to a whole executable. When calling env.step()
there is only one executable (maybe with several agents).
I think you would want to have the number of decisions steps + terminal steps to be constant in a single executable, but in all honesty, I do not know how to make this happen.
In most environment, there is no reason to assume all agents will make decisions at the same time. We allow agents to spawn and to be destroyed at anytime during the simulation, which is something gym cannot do.
Unless you create your own environment where you can control exactly the timing where the agents request decisions and are reset, you will have a variable number of decisions and terminal steps.
Do you have suggestions on how to improve the API?
We are on the same page there. What I wish for is: "decisions steps + terminal steps to be constant in a single executable,"
In most environment, there is no reason to assume all agents will make decisions at the same time. We allow agents to spawn and to be destroyed at anytime during the simulation, which is something gym cannot do.
I agree with you that there is no inherent reason why the decisions of the agents should be synced up in an executable. Honestly, this is more of an unfortunate side-effect of a really valid design decision of UnityML not playing nicely with the GYM paradigm.
Unfortunately, this fact also makes it difficult to work with other learning systems that expects the gym style of same number of observations every step.
Unless you create your own environment where you can control exactly the timing where the agents request decisions and are reset, you will have a variable number of decisions and terminal steps.
Now that you mention it this might be the way to go! I am working on a simple navigation environment, and call Done()
when the agent reaches a goal. I can make it so that the agent waits until the next if (academyStepCount % DecisionPeriod == 0)
to call done. Meanwhile the agent can just... wait I guess, it is just a few frames anyway.
A simple API enhancement (I don't know how many people would really use it, it feels niche) might be to have something like a bool SyncDonesWithDecisionRequest
on the DecisionRequester component which implements the above idea.
Thank you @vincentpierre, this conversation is very helpful!
PS: I felt like I was going crazy when I was trying to update the C# code. I would make a change then poof! Hah :D