VectorizedMultiAgentSimulator
VectorizedMultiAgentSimulator copied to clipboard
[Feature] different number of actions for each action dimension (discrete actions)
This PR supports specifying different number of actions for each action dimension when the action space is discrete. The motivation is to support action dimensions with different semantics that require different discretizations for each dimension (e.g. combining navigation and "boolean" actions). More in general, this should allow to add any kind of discrete action with arbitrary semantics (also thanks to process_action).
To implement this, I added an action_nvec property to Dynamics, which is "mutually exclusive" with needed_action_size, meaning that if one is overridden the other is computed automatically, and vice-versa (they can't be both overridden). By default, action_nvec will be a list of 3s of length action_size when not specified, to keep compatibility with existing scenarios. I also added action_nvec to the Agent class, which is then used in Environment to build the discrete action space(s) and in Environment._set_action to: 1) convert discrete actions in the interval [0, n) to [-u_max, u_max]; and 2) convert discrete actions to multi-discrete ones when needed.
I also added a Composite subclass of Dynamics that allows to compose multiple Dynamics into one by concatenating their actions, which should make it more convenient (modular) to define complex actions/dynamics.
Let me know if there is anything else that can improve the PR.
Thank you for the comments and clarifications! I will address them and continue working on this as soon as I can.
It makes a lot of sense to keep the dynamics agnostic of the action space. To keep compatibility with the current action spaces, it should be possible to rewrite the multidiscrete -> continuous conversion in Environment._set_action so that the discretization logic stays the same for n=3, I'll look into it.
At this point, the overall changes would be:
- adding the
action_nvecparameter toAgent.__init__(optional and mutually exclusive withaction_size) - building discrete spaces in
Environmentand sampling random actions according toaction_nvec - using
action_nvecinEnvironment._set_action, keeping the current logic when n=3 - adding tests where needed Then, as you said, we can delegate any extra action semantics to process_action. Are these ok?
I will also look into interactive_render (I initially left it untouched because it uses continuous actions). Let me know if there is other functionality that you think these changes might break.
Thank you for the comments and clarifications! I will address them and continue working on this as soon as I can.
It makes a lot of sense to keep the dynamics agnostic of the action space. To keep compatibility with the current action spaces, it should be possible to rewrite the multidiscrete -> continuous conversion in
Environment._set_actionso that the discretization logic stays the same for n=3, I'll look into it.At this point, the overall changes would be:
- adding the
action_nvecparameter toAgent.__init__(optional and mutually exclusive withaction_size)
Yes, you basically have already done this. I would not make them mutually exclusive. They will default to None. If one is specified, we create the other one based on that one. If both are specified, we check they are coherent. If none are specified we default to dynamics (and the old 3 way split)
Also maybe it is worth naming this with something that contains "discrete" like discrete_action_nvec to highlight that is not something that interests users that have continuous actions (which I think is majority)
- building discrete spaces in
Environmentand sampling random actions according toaction_nvec
yes, merge main and this should come almost for free with what you have already done
- using
action_nvecinEnvironment._set_action, keeping the current logic when n=3
so here is where i am still a bit undecided.
what I suggest is that we split the range among the n actions.
When n is odd, we make the first action being 0 for bc-compatibility.
This would make so that n=3 is not a special case but the way we handle odd ns
For example:
n=2leads to[-range,range]n=3leads to[0,-range,range]n=4leads to[-range,-range/3,range/3,range]n=5leads to[0,-range,-range/2,range/2,range]
- adding tests where needed
Yeah thanks. the main tests I would like to see is regarding the logic that will be added to environemnt._set_action in different settings
Then, as you said, we can delegate any extra action semantics to process_action. Are these ok?
yep!
I will also look into interactive_render (I initially left it untouched because it uses continuous actions). Let me know if there is other functionality that you think these changes might break.
Oh yes sorry I forgot interactive_render uses continuous, then we should be good there
I added 2 commits with the suggested changes (and merged main):
action_nvecis now only present inAgentand is calleddiscrete_action_nvec(changes to dynamics are reverted)discrete_action_nvecis not mutually exclusive withaction_size, but we check that they are consistent inAgent.__init__- the logic in
_set_actionhandles oddns by converting the first discrete action tou = 0, and shifting the remaining "negative" actions by -1 so that the previous logic applies - the comment in the discrete to multidiscrete conversion in
_set_actionis now accurate (turns out the logic was already the same as before the generalization)
Overall, this results in the following "multidiscrete a" to "continuous u" conversions (where U = u_max):
- when
nis odd (and leth = (n-1)/2, i.e. the middle point of the discrete possibilities):a == 0 ==> u == 0 a in [1, h] ==> u in [-U, -U/h] a in [h+1, n) ==> u in [+U/h, U] - when
nis evena in [0, n] ==> u in [-U, U]
I am now working on adding tests that check the above logic and the discrete-multidiscrete conversion, and I was wondering if it is possible to build a Scenario object directly instead of passing a string to make_env? This would make it possible to set the nvecs to values other than 3 before constructing the Environment object, so that various values can be tested. Currently I am trying to do something like this in tests/test_vmas.py (where the load(scenario) line is taken from make_env):
from vmas import scenarios
...
@pytest.mark.parametrize("scenario", scenario_names())
def test_discrete_action_nvec_multidiscrete(scenario, num_envs=10, n_steps=10):
scenario = scenarios.load(scenario).Scenario()
random.seed(0)
for agent in scenario.agents:
agent.discrete_action_nvec = [random.randint(2, 6) for _ in range(agent.action_size)]
env = make_env(
scenario=scenario,
num_envs=num_envs,
seed=0,
multidiscrete_actions=True,
continuous_actions=False,
)
for _ in range(n_steps):
...
However, I get an error on the first line of the function, saying that scenarios is a list and does not have the load method. I suspect this has something to do with the python import machinery? I also tried using vmas.scenarios directly instead of importing it at the top, with the same result. From what I can see, vmas.scenarios is both a subpackage in the vmas directory and a list defined in vmas/__init__.py, so maybe this is the problem? Is there any other way to accomplish this?
Fantastic work!
It seems to be exactly what we want
I am now working on adding tests that check the above logic and the discrete-multidiscrete conversion, and I was wondering if it is possible to build a
Scenarioobject directly instead of passing a string tomake_env? This would make it possible to set thenvecs to values other than 3 before constructing theEnvironmentobject, so that various values can be tested. Currently I am trying to do something like this intests/test_vmas.py(where theload(scenario)line is taken frommake_env):from vmas import scenarios ... @pytest.mark.parametrize("scenario", scenario_names()) def test_discrete_action_nvec_multidiscrete(scenario, num_envs=10, n_steps=10): scenario = scenarios.load(scenario).Scenario() random.seed(0) for agent in scenario.agents: agent.discrete_action_nvec = [random.randint(2, 6) for _ in range(agent.action_size)] env = make_env( scenario=scenario, num_envs=num_envs, seed=0, multidiscrete_actions=True, continuous_actions=False, ) for _ in range(n_steps): ...However, I get an error on the first line of the function, saying that
scenariosis alistand does not have theloadmethod. I suspect this has something to do with the python import machinery? I also tried usingvmas.scenariosdirectly instead of importing it at the top, with the same result. From what I can see,vmas.scenariosis both a subpackage in thevmasdirectory and a list defined invmas/__init__.py, so maybe this is the problem? Is there any other way to accomplish this?
if you do from vmas import scenarios it will load the scenario list.
What you want is the module:
@pytest.mark.parametrize("scenario", scenario_names())
def test_prova(scenario, num_envs=10, n_steps=10):
from vmas.scenarios import load
scenario_class = load(f"{scenario}.py").Scenario()
env = make_env(
scenario=scenario_class,
num_envs=num_envs,
seed=0,
multidiscrete_actions=True,
continuous_actions=False,
)
for _ in range(n_steps):
env.step(env.get_random_actions())
this will work
n=1 is not currently handled properly and results in NaNs
should this even be allowed? I would introduce a check that reads the nvec and checks all values are >= 2
Thanks for the comments! I've committed the suggested changes and fixed the bug in the discrete -> multi-discrete conversion
n=1is not currently handled properly and results in NaNsshould this even be allowed? I would introduce a check that reads the nvec and checks all values are >= 2
Yes, this should not happen, I added an input sanity check in Agent.__init__.
To make sure all of this works as intended, I've added some tests that check:
- if randomly sampled actions are in the computed action space
- if multi-discrete actions map to the expected continuous control after
Environment._set_action - if discrete actions that come from the product of multi-discrete spaces are converted properly to multi-discrete actions in
Environment._set_action
These are run for all scenarios that don't explicitly override process_action (since it breaks the action <-> control mapping). nvecs are randomly sampled (with a fixed seed) in the interval [2,6] to properly test mixed odd-even nvecs and odd n != 3. Let me know if there is anything else that needs testing.