Sergio Guadarrama

Results 68 comments of Sergio Guadarrama

One question that is not clear to me is if we want to implement `[3, 3, 2, 3]` as `3 * 3 * 2 * 3` joint logits or as...

Please add 2 new test to make sure the new arguments work as expected.

I would recommend to keep all the mask logic in the Python part of the Environment, instead of trying to use tf.function. So in the `PyEnv` step(action) function generate the...

You can create a dict or a namedtuple for the observations, those would be passed to the Network, if you only want to pass the true observation to the Network...

The 'legal_actions' shouldn't be part of the observation_spec since it's going to be separated with the observation_action_splitter

So make sure that the `observation_action_splitter` splits the observation and the mask correctly. ``` def observation_action_splitter(obs): return obs['price'], obs['legal_actions'] ```

Have you tried instead having a nested action space? In which each action can have different number of actions?

Yeah you can use `gym.spaces.Dict` or directly nested `ArraysSpecs` to define the actions. Then each one can have their own `Categorical` distribution and sampling will sample all of them.

It seems that the collecting policy doesn't have `scale_diag` as part of the `dist_params` so I suppose there is a mismatch in the policy used to collect the data the...