imitation icon indicating copy to clipboard operation
imitation copied to clipboard

Support for Dictionary Observation Space

Open damaruga opened this issue 2 years ago • 5 comments

Problem

Robotic Env such as SurRoL and Fetch uses Dictionary Observation Space, with

  1. observation
  2. desired_goal
  3. achieved_goal as keys.

Query

Is there a quick fix to train an agent using BC in the current release for these kinds of environments?

damaruga avatar Feb 14 '23 05:02 damaruga

It is not supported right now: https://github.com/HumanCompatibleAI/imitation/issues/651#issuecomment-1420219620

We are currently reworking the way data is handled in general. @AdamGleave do you think there is a quick fix? Should we aim to support this in the long term?

Maybe you can use the space utils to flatten/unflatten the space? https://gymnasium.farama.org/api/spaces/utils/

ernestum avatar Feb 15 '23 18:02 ernestum

Hello! I have a similar issue as I need to use a dictionary input method. If one was to implement this, where would be a good starting point?

tudorjnu avatar Mar 20 '23 06:03 tudorjnu

As long as you don't plan on treating different parts of the observation space (e.g. applying a CNN to image data but taking in accelerator data directly) you should be good with a gym.wrappers.FlattenObservation wrapper around your environment.

ernestum avatar Mar 20 '23 09:03 ernestum

Personally that is roughly what I need as I have a custom environment generating different types of observations (i.e. image, joint pos, etc.).

On a tangent note, is it possible to generate transtitions using an environment (i.e. dict based env) and store only the image for example and then use imitation to learn those transitions (image -> action)?

I have been thinking to create a custom rollout wrapper or something along those lines so that when the expert generates the trajectories, they can be filtered. What do you think?

I asked here as it seems to be related to the issue. Thank you!

tudorjnu avatar Mar 20 '23 09:03 tudorjnu

PR #785 partially addresses this. Once it's merged dictionary observations should be supported in:

Core functionality

  • [x] Collecting rollouts
  • [ ] Saving / writing trajectories to disk
  • [ ] Buffers

Algorithms:

  • [x] Behavioral Cloning
  • [ ] DAgger
  • [x] Density based reward modelling
  • [ ] MCEIRL
  • [ ] Adversarial AIRL / GAIL
  • [ ] SQIL
  • [ ] Preference Comparisons

NixGD avatar Sep 16 '23 00:09 NixGD