agents
agents copied to clipboard
Would you like to see a MultiCategorical projection network?
I see that existing CategoricalProjectionNetwork supports only with the same number of actions along all dimensions.
So, for example, discrete actions: [3, 3, 3, 3]
-- good.
Discrete actions: [3, 3, 2, 3]
-- not supported.
I see in the code of CategoricalProjectionNetwork that you advise to implement more flexible distribution myself. This is not actually that hard to support.
- We can have a single dense layer that outputs sum of all actions as logits. For example,
[3, 3, 2, 3]
actions count will be converted to3 + 3 + 2 + 3
logits. - When we need to get the output distribution, we apply the dense layer, split the logits (in the example:
[n_batch, 3], [n_batch, 3], [n_batch, 2], [n_batch, 3]
). - Return a custom distribution that gets the splitted logits and creates multiple
tfd.Categorical
internally. Implementmode()
andsample()
to call all internal categories.
I would like to know why it was decided to restrict this to the action spaces with same number of unique actions along all dimensions?
That generalization should work. I think we just didn't have any envs that required it.
One question that is not clear to me is if we want to implement [3, 3, 2, 3]
as 3 * 3 * 2 * 3
joint logits or as independent 3 + 3 + 2 + 3
logits.
The independent logits is easier to implement, probably a good addition.
I have a solution for independent 3 + 3 + 2 + 3
logits. I will prepare a pull-request when I have time.
https://gist.github.com/sidney-tio/66abada949f1b629dd9ee28777d402d5
For anyone looking for a quick implementation: I made a work around based on the suggestion by @PeterZhizhin
If the team likes, I could make a proper PR for this as well.