agents
agents copied to clipboard
Projection Network for more than 1 action with differing action spaces
Following the discussion from #37, I developed a MultiCategoricalProjectionNetwork
that splits logits and creates the respective Categorical
distribution. I tried to adhere to the same pattern as far as I could; It can be found here: https://gist.github.com/sidney-tio/66abada949f1b629dd9ee28777d402d5
If the team would like, I could raise a PR based on the gist I developed. From what I see, these are the to-dos to make it PR-worthy:
- [x] add tests
- [x] add more detailed docstrings
- [x] add masks
Have you tried instead having a nested action space? In which each action can have different number of actions?
No, I don't think I have tried that. I assume you are referring to something like a gym.spaces.Dict
type of nested structure where we could specify {'action1': 4, 'action2': 3}
? Could you elaborate further?
Yeah you can use gym.spaces.Dict
or directly nested ArraysSpecs
to define the actions. Then each one can have their own Categorical
distribution and sampling will sample all of them.
my current workflow was to generate a spec from a gym.spaces.MultiDiscrete
instance before creating the network.
I can see why something like a nested action space would be useful. I also just tried from a gym.spaces.Dict
; would need loop through the iterable before generating the respective Categorical
distributions.
i'll add a function to check for iterable and extract the relevant information. let me make the changes and, if its okay, I will raise a draft PR
hello, not sure if it was missed, but the PR for this issue is up. Could I request for a review please?