agents icon indicating copy to clipboard operation
agents copied to clipboard

Projection Network for more than 1 action with differing action spaces

Open sidney-tio opened this issue 3 years ago • 5 comments

Following the discussion from #37, I developed a MultiCategoricalProjectionNetwork that splits logits and creates the respective Categorical distribution. I tried to adhere to the same pattern as far as I could; It can be found here: https://gist.github.com/sidney-tio/66abada949f1b629dd9ee28777d402d5

If the team would like, I could raise a PR based on the gist I developed. From what I see, these are the to-dos to make it PR-worthy:

  • [x] add tests
  • [x] add more detailed docstrings
  • [x] add masks

sidney-tio avatar Jan 03 '22 10:01 sidney-tio

Have you tried instead having a nested action space? In which each action can have different number of actions?

sguada avatar Jan 04 '22 22:01 sguada

No, I don't think I have tried that. I assume you are referring to something like a gym.spaces.Dict type of nested structure where we could specify {'action1': 4, 'action2': 3}? Could you elaborate further?

sidney-tio avatar Jan 11 '22 18:01 sidney-tio

Yeah you can use gym.spaces.Dict or directly nested ArraysSpecs to define the actions. Then each one can have their own Categorical distribution and sampling will sample all of them.

sguada avatar Jan 12 '22 01:01 sguada

my current workflow was to generate a spec from a gym.spaces.MultiDiscrete instance before creating the network.

I can see why something like a nested action space would be useful. I also just tried from a gym.spaces.Dict; would need loop through the iterable before generating the respective Categorical distributions.

i'll add a function to check for iterable and extract the relevant information. let me make the changes and, if its okay, I will raise a draft PR

sidney-tio avatar Jan 13 '22 11:01 sidney-tio

hello, not sure if it was missed, but the PR for this issue is up. Could I request for a review please?

sidney-tio avatar Mar 02 '22 07:03 sidney-tio