Mario Ynocente Castro issues

Results 6 issues of


                                            Mario Ynocente Castro

Discrepancy in SAC on entropy coefficient update

Noticed that [here ](https://github.com/pfnet/pfrl/blob/master/pfrl/agents/soft_actor_critic.py#L281) the `log_prob` variable is computed before the udpate of the actor while on SAC's [repo ](https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py#L246) it is recomputed after the actor update (the [paper](https://arxiv.org/abs/1812.05905) also...

Batch and async training do not work with macOS/Windows and Python >= 3.8

In Python 3.8 the default mode of multiprocessing for macOS was changed For reference: https://github.com/chainer/chainerrl/issues/572

[WIP] MDQN

Adds MDQN according to: https://arxiv.org/abs/2007.14430 Reference implementation: https://github.com/google-research/google-research/tree/master/munchausen_rl

ACER test may be flaky

After merging #112 `test_acer` fails, but shouldn't be related to this PR

addcmul_ used in RMSpropEpsInsideSqrt has been deprecated

``` UserWarning: This overload of addcmul_ is deprecated: addcmul_(Number value, Tensor tensor1, Tensor tensor2) Consider using one of the following signatures instead: addcmul_(Tensor tensor1, Tensor tensor2, *, Number value) (Triggered...

Wrong value in call to F.softmax

Should `F.softmax(Q_targets_next, dim=1)` be `F.softmax(Q_targets_next / entropy_tau, dim=1)` instead?