sheeprl
sheeprl copied to clipboard
ReplayBuffer storing actions size mismatch during env reset
Hi,
I am trying to write a simple gym wrapper for an existing env. During testing, I am not facing the following issue:
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 647, in main
rb.add(reset_data, dones_idxes, validate_args=cfg.buffer.validate_args)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/data/buffers.py", line 656, in add
self._buf[env_idx].add(env_data, validate_args=validate_args)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/data/buffers.py", line 220, in add
self.buffer[k][idxes] = data_to_store[k]
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/utils/memmap.py", line 264, in __setitem__
self.array[idx] = value
ValueError: shape mismatch: value array of shape (1,1,5) could not be broadcast to indexing result of shape (1,1,4)
Which, I think, originates from this line: reset_data["actions"] = np.zeros((1, reset_envs, np.sum(actions_dim)))
(line 643 in dreamer_v3.py). My env has action_space.shape
of (1,4) - but in this line it is summing up to 1+4=5
.
Is this the desired behavior?
Thanks
Hi @defrag-bambino, thank you for reporting this problem.
Which action space are you using? Are they continuous actions?
In this case, we assume that continuous actions have a shape with a dimension, something like this: (n,)
. This allows us to handle continuous, discrete, and multidiscrete in the same way.
I would suggest you try changing the action space to dimension (4,).
@belerico might it make sense to have a wrapper that flattens the continuous actions?
Yes, it is a continuous "Box" Space. The problem is that this particular action_space is (N_AGENTS, 4). So there is different versions of the gym env with different action_space shapes).
I've tried to work around it using np.squeeze() and np.expand_dims() in relevant places of my env wrapper. This seems to work for now. However, after a few seconds it crashes with this error
Stacktrace
Traceback (most recent call last):
File "/home/drt/miniconda3/envs/sheeprl/bin/sheeprl", line 8, in
Seems like the same holds for the observation shape (1, 72)
.
Hi @defrag-bambino, thank you for reporting this problem.
Which action space are you using? Are they continuous actions? In this case, we assume that continuous actions have a shape with a dimension, something like this:
(n,)
. This allows us to handle continuous, discrete, and multidiscrete in the same way. I would suggest you try changing the action space to dimension (4,).@belerico might it make sense to have a wrapper that flattens the continuous actions?
Yep, we can add it and leave it to the user to use it
I've tried to work around it using np.squeeze() and np.expand_dims() in relevant places of my env wrapper. This seems to work for now. However, after a few seconds it crashes with this error
Stacktrace Traceback (most recent call last): File "/home/drt/miniconda3/envs/sheeprl/bin/sheeprl", line 8, in sys.exit(run()) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main _run_hydra( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 222, in run_and_report raise ex File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 219, in run_and_report return func() File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 352, in run run_algorithm(cfg) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 190, in run_algorithm fabric.launch(reproducible(command), cfg, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 839, in launch return self._wrap_and_launch(function, self, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 924, in _wrap_and_launch return launcher.launch(to_run, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/strategies/launchers/subprocess_script.py", line 104, in launch return function(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 930, in _wrap_with_setup return to_run(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 186, in wrapper return func(fabric, cfg, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 677, in main train( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 113, in train embedded_obs = world_model.encoder(batch_obs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 119, in forward output = self._forward_module(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 469, in forward mlp_out = self.mlp_encoder(obs, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/agent.py", line 151, in forward return self.model(x) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 119, in forward return self.model(obs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x72 and 1x512)
Seems like the same holds for the observation shape
(1, 72)
.
If your observation space is a 1D vector, then you should also remove the leadning 1 in the dimension i suppose. Can you try it?
Hi @defrag-bambino, we're sorry but right now Multi-Agent RL (MARL) is not supported, so your actions and observations space must be unrelated from the number of agents, which are considered as independentfrom one another. This means that:
- Observations must be 1D vectors or 2D/3D images: everything that is not a 1D vector will be processed by a CNN by the agent. A 2D image or a 3D image of shape
[H,W,1]
or[1,H,W]
will be considered as a grayscale image, a multi-channel image otherwise. - An action of type
gymnasium.spaces.Box
must be of shape(n,)
, wheren
is the number of (possibly continuous) actions the environment supports. - Every agent runs in its own environment
Maybe there could be a solution as explained in #241