sheeprl ReplayBuffer storing actions size mismatch during env reset

Hi,

I am trying to write a simple gym wrapper for an existing env. During testing, I am not facing the following issue:

  File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 647, in main
    rb.add(reset_data, dones_idxes, validate_args=cfg.buffer.validate_args)
  File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/data/buffers.py", line 656, in add
    self._buf[env_idx].add(env_data, validate_args=validate_args)
  File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/data/buffers.py", line 220, in add
    self.buffer[k][idxes] = data_to_store[k]
  File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/utils/memmap.py", line 264, in __setitem__
    self.array[idx] = value
ValueError: shape mismatch: value array of shape (1,1,5) could not be broadcast to indexing result of shape (1,1,4)

Which, I think, originates from this line: reset_data["actions"] = np.zeros((1, reset_envs, np.sum(actions_dim))) (line 643 in dreamer_v3.py). My env has action_space.shape of (1,4) - but in this line it is summing up to 1+4=5.

Is this the desired behavior?

Thanks

May 03 '24 08:05 defrag-bambino

Hi @defrag-bambino, thank you for reporting this problem.

Which action space are you using? Are they continuous actions? In this case, we assume that continuous actions have a shape with a dimension, something like this: (n,). This allows us to handle continuous, discrete, and multidiscrete in the same way. I would suggest you try changing the action space to dimension (4,).

@belerico might it make sense to have a wrapper that flattens the continuous actions?

May 03 '24 14:05 michele-milesi

Yes, it is a continuous "Box" Space. The problem is that this particular action_space is (N_AGENTS, 4). So there is different versions of the gym env with different action_space shapes).

May 03 '24 15:05 defrag-bambino

I've tried to work around it using np.squeeze() and np.expand_dims() in relevant places of my env wrapper. This seems to work for now. However, after a few seconds it crashes with this error

Stacktrace

Traceback (most recent call last): File "/home/drt/miniconda3/envs/sheeprl/bin/sheeprl", line 8, in sys.exit(run()) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main _run_hydra( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 222, in run_and_report raise ex File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 219, in run_and_report return func() File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 352, in run run_algorithm(cfg) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 190, in run_algorithm fabric.launch(reproducible(command), cfg, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 839, in launch return self._wrap_and_launch(function, self, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 924, in _wrap_and_launch return launcher.launch(to_run, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/strategies/launchers/subprocess_script.py", line 104, in launch return function(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 930, in _wrap_with_setup return to_run(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 186, in wrapper return func(fabric, cfg, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 677, in main train( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 113, in train embedded_obs = world_model.encoder(batch_obs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 119, in forward output = self._forward_module(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 469, in forward mlp_out = self.mlp_encoder(obs, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/agent.py", line 151, in forward return self.model(x) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 119, in forward return self.model(obs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x72 and 1x512)

Seems like the same holds for the observation shape (1, 72).

May 03 '24 15:05 defrag-bambino

Hi @defrag-bambino, thank you for reporting this problem.

Which action space are you using? Are they continuous actions? In this case, we assume that continuous actions have a shape with a dimension, something like this: (n,). This allows us to handle continuous, discrete, and multidiscrete in the same way. I would suggest you try changing the action space to dimension (4,).

@belerico might it make sense to have a wrapper that flattens the continuous actions?

Yep, we can add it and leave it to the user to use it

May 03 '24 19:05 belerico

I've tried to work around it using np.squeeze() and np.expand_dims() in relevant places of my env wrapper. This seems to work for now. However, after a few seconds it crashes with this error

Stacktrace Traceback (most recent call last): File "/home/drt/miniconda3/envs/sheeprl/bin/sheeprl", line 8, in sys.exit(run()) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main _run_hydra( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 222, in run_and_report raise ex File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 219, in run_and_report return func() File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 352, in run run_algorithm(cfg) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 190, in run_algorithm fabric.launch(reproducible(command), cfg, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 839, in launch return self._wrap_and_launch(function, self, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 924, in _wrap_and_launch return launcher.launch(to_run, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/strategies/launchers/subprocess_script.py", line 104, in launch return function(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 930, in _wrap_with_setup return to_run(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 186, in wrapper return func(fabric, cfg, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 677, in main train( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 113, in train embedded_obs = world_model.encoder(batch_obs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 119, in forward output = self._forward_module(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 469, in forward mlp_out = self.mlp_encoder(obs, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/agent.py", line 151, in forward return self.model(x) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 119, in forward return self.model(obs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x72 and 1x512)

Seems like the same holds for the observation shape (1, 72).

If your observation space is a 1D vector, then you should also remove the leadning 1 in the dimension i suppose. Can you try it?

May 03 '24 19:05 belerico

Hi @defrag-bambino, we're sorry but right now Multi-Agent RL (MARL) is not supported, so your actions and observations space must be unrelated from the number of agents, which are considered as independentfrom one another. This means that:

Observations must be 1D vectors or 2D/3D images: everything that is not a 1D vector will be processed by a CNN by the agent. A 2D image or a 3D image of shape [H,W,1] or [1,H,W] will be considered as a grayscale image, a multi-channel image otherwise.
An action of type gymnasium.spaces.Box must be of shape (n,), where n is the number of (possibly continuous) actions the environment supports.
Every agent runs in its own environment

May 06 '24 08:05 belerico

Maybe there could be a solution as explained in #241

May 06 '24 08:05 belerico

sheeprl sheeprl copied to clipboard

ReplayBuffer storing actions size mismatch during env reset

sheeprl
sheeprl copied to clipboard