agents Would you like to see a MultiCategorical projection network?

Would you like to see a MultiCategorical projection network?

Open PeterZhizhin opened this issue 5 years ago • 4 comments

I see that existing CategoricalProjectionNetwork supports only with the same number of actions along all dimensions.

So, for example, discrete actions: [3, 3, 3, 3] -- good. Discrete actions: [3, 3, 2, 3] -- not supported.

I see in the code of CategoricalProjectionNetwork that you advise to implement more flexible distribution myself. This is not actually that hard to support.

We can have a single dense layer that outputs sum of all actions as logits. For example, [3, 3, 2, 3] actions count will be converted to 3 + 3 + 2 + 3 logits.
When we need to get the output distribution, we apply the dense layer, split the logits (in the example: [n_batch, 3], [n_batch, 3], [n_batch, 2], [n_batch, 3]).
Return a custom distribution that gets the splitted logits and creates multiple tfd.Categorical internally. Implement mode() and sample() to call all internal categories.

I would like to know why it was decided to restrict this to the action spaces with same number of unique actions along all dimensions?

Mar 15 '19 19:03 PeterZhizhin

That generalization should work. I think we just didn't have any envs that required it.

Mar 18 '19 21:03 oars

One question that is not clear to me is if we want to implement [3, 3, 2, 3] as 3 * 3 * 2 * 3 joint logits or as independent 3 + 3 + 2 + 3 logits.

The independent logits is easier to implement, probably a good addition.

Mar 19 '19 00:03 sguada

I have a solution for independent 3 + 3 + 2 + 3 logits. I will prepare a pull-request when I have time.

Mar 20 '19 15:03 PeterZhizhin

https://gist.github.com/sidney-tio/66abada949f1b629dd9ee28777d402d5

For anyone looking for a quick implementation: I made a work around based on the suggestion by @PeterZhizhin

If the team likes, I could make a proper PR for this as well.

Nov 25 '21 09:11 sidney-tio

agents agents copied to clipboard

Would you like to see a MultiCategorical projection network?

agents
agents copied to clipboard