sheeprl
sheeprl copied to clipboard
Last `N` actions as `mlp_keys` encoder input for `dreamer_v3`
Hi,
Working on an Atari environment wrapper with action input buffer with len=N
that I want to feed as input to mlp_keys
.
Algo config:
algo:
mlp_keys:
encoder: [actions]
However, unable to get it working, getting error TypeError: object of type 'NoneType' has no len()
at
File "/home/sam/dev/ml/sheeprl/sheeprl/utils/env.py", line 171, in <listcomp>
[k for k in env.observation_space.spaces.keys() if len(env.observation_space[k].shape) in {2, 3}]
Because gym.spaces.Tuple
has no member shape
.
Wondering what should change in this wrapper so it correctly interfaces with what sheeprl
expects? Would there be a way to augment Tuple
to have a shape, or should it change to a Box
? If needed to be Box
, what should be its config?
class InputBufferWtihActionsAsInput_Atari(gym.Wrapper):
def __init__(self, env: gym.Env, input_buffer_amount: int = 0):
super().__init__(env)
if input_buffer_amount <= 0:
raise ValueError("`amount` should be a positive integer")
self._input_buffer_amount = input_buffer_amount
self._input_buf = deque(maxlen=input_buffer_amount)
self.observation_space = gym.spaces.Dict({
"rgb": self.env.observation_space,
"actions": gym.spaces.Tuple([self.env.action_space] * input_buffer_amount)
})
def get_obs(self, observation):
return {
"rgb": observation,
"actions": self._input_buf
}
def reset(self, **kwargs):
obs, infos = super().reset(**kwargs)
while len(self._input_buf) < self._input_buf.maxlen:
self._input_buf.append(self.env.action_space.sample())
return self.get_obs(obs), infos
def step(self, action):
this_frame_action = self._input_buf[0]
self._input_buf.append(action)
obs, reward, done, truncated, infos = self.env.step(this_frame_action)
return self.get_obs(obs), reward, done, truncated, infos
Edit: I have a working setup using hard-coded, implementation details-aware wrapper using stuff like this. Still wondering how to achieve generic solution though.
self.observation_space = gym.spaces.Dict({
"rgb": self.env.observation_space,
#"last_action": self.env.action_space
#"actions": gym.spaces.Box(shape=(self.env.action_space.shape, input_buffer_amount), dtype=np.int64)
#"actions": gym.spaces.Box([self.env.action_space] * input_buffer_amount)
"actions_0": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
"actions_1": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
"actions_2": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
"actions_3": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
})
def get_obs(self, observation: Any) -> Any:
#observation['past_actions'] = spaces.Space(list(self._input_buf))
return {
"rgb": observation,
#"last_action": self._input_buf[0]
#"actions": np.array(self._input_buf, dtype=np.int64)
"actions_0": self._input_buf[0],
"actions_1": self._input_buf[1],
"actions_2": self._input_buf[2],
"actions_3": self._input_buf[3],
}
Hi @geranim0,
yes, the observation space must have the shape attribute. I suggest to use the gymnasium.spaces.Box
space to augment the observations of the environment.
I prepared a branch with the ActionsAsObservationWrapper
that allows you to add last n
actions: https://github.com/Eclectic-Sheep/sheeprl/tree/feature/actions-as-obs.
You can specify the number of actions in the env.action_stack
parameter. You can also add a dilation between actions (like in the FrameStack
), you can set the dilation with the env.action_stack_dilation
parameter in the configs.
The key is "action_stack"
, otherwise it creates conflicts during training (add it to the mlp_keys).
Let me know if it works
Note: Discrete actions are converted into one-hot actions (as the agent works with one-hot actions in the discrete case). We can discuss which is the best option.
cc @belerico
Hi @Michele,
Thanks for the branch! Taking a look and doing some tests with it.
So, did some testing, here are the results
Where the gray line represents the agent trained with the last N
(in this case, 12) actions added to the observations, and the blue line represents the agent trained with the same input buffer (12), without the input buffer as observation. Only 1 run was made for each, but it looks like in the presence of a large input buffer, adding the input buffer as observations is helpful.
It also suggests that the wrapper works 👍
Only modification I made to your branch was add an input buffer to the wrapper.
Great, I'm glad it works. I do not understand why you added the input buffer and how you used it. Can you show me which modification you made? Thanks
Sure, actually it is in my first message, in the step
function. Instead of using this frame action, I use the one ready for use in the buffer with this_frame_action = self._input_buf[0]
.
The purpose of this is to simulate human reaction time. That's why I wanted to test adding the input buffer to the observation, to see if it would improve performance (looks like it does).
Understood, thanks
Hi @geranim0, if this is done we can add this feature in a new PR and put it in the next release
Hi @belerico, sure!
Side note though, in tests using Discrete
action space, things worked fine, but encountered some problems with the action shape not being handled with MultiDiscrete
envs for the action-as-obs
wrapper and also dreamer_v3.py::main()
with this portion
real_actions = (
torch.cat([real_act.argmax(dim=-1) for real_act in real_actions], dim=-1).cpu().numpy()
)
step_data["actions"] = actions.reshape((1, cfg.env.num_envs, -1))
For now got around it by reshaping my action space to Discrete
. Kind of using an old branch, will re-test when updating.
Hi @geranim0, can you share the error you encountered and which environment you are using? Thanks
I should have fixed the problem, could you check with the multidiscrete action space? Thanks