simulate icon indicating copy to clipboard operation
simulate copied to clipboard

Initial Take Cover Environment attempt + discussion

Open shyamsn97 opened this issue 3 years ago • 2 comments

Take cover environment example

Note: This probably shouldn't be merged because its a bit hacky, but just wanted to publish this as a way to discuss some stuff

First off, thanks so much for creating this library! Personally, I've been wanting something like this for so long! The ability to create Unity RL environments from plain python objects is super convenient and will definitely be super useful in creating more open-ended environments. I wanted to take a dive into the library, so I decided to try to port one of my favorite environments: Vizdoom's take-cover environment. Decided to jot down some of my notes and experience as a first time user. Also, I understand that this library is still under development and definitely has many important core design principles that I am not aware of, so take my feedback with a grain of salt :sweat_smile:

Environment Details

  • The take-cover environment is a simplified version of https://github.com/shakenes/vizdoomgym/blob/master/vizdoomgym/envs/scenarios/README.md#take-cover. A video of the environment: https://www.youtube.com/watch?v=9SATUL6irFw. Basically, the agent can move left and right and has to dodge homing fireballs hurled at it.
  • The one I implemented here is super simple, as the projectiles just go in a straight line starting from random locations.

Discussion

Hopefully this perspective is useful as I only got my hands on this recently (this week), and I've had experience with some (relatively) similar frameworks like unity-ml-agents

Positives

  • RewardFunction objects are super convenient and I think well designed. While simple, combining them is pretty trivial and I found them super easy to learn
  • The Camera object for actors is amazingly convenient. Being able to easily attach a first-person view to an agent is incredibly useful for building RL agents that can transfer from simulation -> real world.
  • Its really easy to design a simple environment (with one agent and stationary objectives / collectibles) with the built in structures. Adding walls, floors, objectives + a single agent was super easy to pick up.
  • The seemless interaction with SB3 is really nice, as its trivial to train something like a PPO agent on a simple 3D environment I created.

Some Feedback

Its hard to make "dynamic" environments

The take-cover environment requires some more complicated, dynamic interactions:

  • Projectiles that home in on the agents location
  • Monsters / projectiles that spawn in random starting locations

I found this kinda hard to implement for a couple reasons:

  1. Projectiles can't actually be created on the spot because after the scene is initialized, I couldn't find a way to dynamically create another object. I'm sure this is planned in the future as https://github.com/huggingface/simulate/blob/main/src/simulate/engine/unity_engine.py#L177 hasn't been implemented just yet

  2. Actor objects seemed to be the only objects that can be really modified (for instance, moved in a particular direction: https://github.com/huggingface/simulate/blob/main/src/simulate/assets/action_mapping.py#L25). This kind of makes it tricky to add dynamic components, like projectiles, because we don't really want the policy to really take in stuff like observations from them. I do still like the idea of having "agents" be the things that can be "acted" upon, but I think that maybe having another level of abstraction, maybe something like "Actor" -> "PolicyActor" to specify which objects should be controlled by an actual policy. This might be a bit confusing and may not be the best solution, but I'll probably make a separate pr to show what I have in mind, if that seems useful.

  3. Some ActionMapping objects are limited for some cases The ActionMapping objects are really convenient for most cases, but are kind of limited in other scenarios, like when we want conditional actions. Like for instance, in the take cover environment we need to reset projectiles when they reach a certain threshold. This is kind of hard because set_position needs a hardcoded position to reset to, instead of being able to provide it dynamically, leading to some messy interactions (here I had to hardcode possible positions for the projectiles to reset to: https://github.com/shyamsn97/simulate/blob/take-cover/examples/rl/sb3_take_cover.py#L152-L158). Not sure if I missed something, but is there a way to dynamically input a new position, like an action? From my understanding this is nontrivial because the actions from the actuators, along with the scenes are converted to gltf and passed to unity, which I guess is why its hard to also dynamically add stuff to the scene.

  4. Hard to customize / understand ActionMapping objects I found it hard to actually understand / customize action mappings. Since they're all in one class, I found it a bit confusing to figure out what each parameter did for each action type. As a possible improvement to this, I think having separate ActionMapping dataclasses for each action could make it a lot easier to understand and customize each one. For instance, a set_position action could be something like:

@dataclass
class SetPositionActionMapping(ActionMapping):
    position: Optional[List[float]]

I'll probably put out a pr to also show what I have in mind for this!

  1. Customizing RLEnv is a bit hacky Not sure if the intention is to actually allow customization, but for my implementation I needed to have customized reset and step functions, but it was a bit hacky in that I had to just copy the methods from the base. Maybe it could help to an empty function for reset + step that are just like custom_reset or custom_step that can run before / after the original methods?

Some bugs / weird unity stuff

I noticed with the set_position action, the object moves to a different position than if specified in the actual asset creation. For example:

sm.Box(
            name=f"projectile_{i}_{index}",
            position=target_position,
           ...

would move to a different position even if sm.ActionMapping("set_position", target_position, use_local_coordinates=False) is used.


Sorry if this is a bit lengthy, but super excited about the future of simulate! Let me know if any of the stuff above doesn't make sense (I'm sure a lot is just my misunderstanding :smile:). Also would love to collaborate if there's some open problems that I can work on! Thanks!

shyamsn97 avatar Oct 05 '22 23:10 shyamsn97

Hey @shyamsn97 -- thanks for the awesome feedback. We'll comment at more length soon!

natolambert avatar Oct 06 '22 01:10 natolambert

This is super cool feedback, thanks a lot for sharing @shyamsn97 😍

Overall I think your proposals make a lot of sense

thomwolf avatar Oct 06 '22 09:10 thomwolf