Yasuhiro Fujita comments

Results 90 comments of


                                            Yasuhiro Fujita

The score of `train_dqn_gym.py` with `--actor-learner` is lower than the baseline score.

Possible hypothesis: - Just variance. At least both seems successful, reaching median score of 200. - Difference in `n_updates`. Actor-learner can collect data faster, but model updates are not always...

SAC-Discrete Implementation

There is no plan to implement SAC-Discrete (https://arxiv.org/abs/1910.07207) in the near future. Contributions are welcome.

ACER - Examples on continuous action space

You are right, we don't have an example script for ACER with a continuous action space for now. It is nice to add one. For now we only have a...

ACER - Examples on continuous action space

It seems like a bug in ACER. Can you try commenting out `assert param.requires_grad` and see if it works?

ACER - Examples on continuous action space

Thank you for confirming that. It seems like `ACER` does not work with `GaussianHeadWithFixedCovariance` for now. I think this needs to be fixed. A possible workaround for it would be...

ACER - Examples on continuous action space

I meant the learning rate you set when you make your optimizer, since PyTorch's optimizer allows setting parameter-specific learning rates. I guess you are right about `var_func`. Any `var_func` that...

ACER - Examples on continuous action space

Here is PyTorch's official doc: https://pytorch.org/docs/stable/optim.html#per-parameter-options `pfrl.optimizers.SharedRMSpropEpsInsideSqrt` is a subclass of `torch.optim.Optimizer` so it is no different.

ACER - Examples on continuous action space

I hope #145 will resolve the issue.

ACER - Examples on continuous action space

It is tested on a toy env with continuous actions, but it is not verified that it can reproduce the performance on the continuous tasks in the paper. Here is...

MultiDiscrete action spaces

I guess you can wrap the output of `SoftmaxCategoricalHead` with `torch.distributions.Independent` so that your resulting distribution's batch_shape is (N,) and event_shape is (4,).