Sergio Guadarrama comments

Results 68 comments of


                                            Sergio Guadarrama

Would you like to see a MultiCategorical projection network?

One question that is not clear to me is if we want to implement `[3, 3, 2, 3]` as `3 * 3 * 2 * 3` joint logits or as...

Error in loading policy from saved model.

Does it work on TF 2.4?

Bias Layer: Add support for regularizer and constraint

Please add 2 new test to make sure the new arguments work as expected.

Example observation_and_action_constraint_splitter

I would recommend to keep all the mask logic in the Python part of the Environment, instead of trying to use tf.function. So in the `PyEnv` step(action) function generate the...

Example observation_and_action_constraint_splitter

You can create a dict or a namedtuple for the observations, those would be passed to the Network, if you only want to pass the true observation to the Network...

Example observation_and_action_constraint_splitter

The 'legal_actions' shouldn't be part of the observation_spec since it's going to be separated with the observation_action_splitter

Example observation_and_action_constraint_splitter

So make sure that the `observation_action_splitter` splits the observation and the mask correctly. ``` def observation_action_splitter(obs): return obs['price'], obs['legal_actions'] ```

Projection Network for more than 1 action with differing action spaces

Have you tried instead having a nested action space? In which each action can have different number of actions?

Projection Network for more than 1 action with differing action spaces

Yeah you can use `gym.spaces.Dict` or directly nested `ArraysSpecs` to define the actions. Then each one can have their own `Categorical` distribution and sampling will sample all of them.

action output and policy_step_spec structures do not match:

It seems that the collecting policy doesn't have `scale_diag` as part of the `dist_params` so I suppose there is a mismatch in the policy used to collect the data the...