Mario Ynocente Castro

Results 6 issues of Mario Ynocente Castro

Noticed that [here ](https://github.com/pfnet/pfrl/blob/master/pfrl/agents/soft_actor_critic.py#L281) the `log_prob` variable is computed before the udpate of the actor while on SAC's [repo ](https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py#L246) it is recomputed after the actor update (the [paper](https://arxiv.org/abs/1812.05905) also...

In Python 3.8 the default mode of multiprocessing for macOS was changed For reference: https://github.com/chainer/chainerrl/issues/572

Adds MDQN according to: https://arxiv.org/abs/2007.14430 Reference implementation: https://github.com/google-research/google-research/tree/master/munchausen_rl

After merging #112 `test_acer` fails, but shouldn't be related to this PR

``` UserWarning: This overload of addcmul_ is deprecated: addcmul_(Number value, Tensor tensor1, Tensor tensor2) Consider using one of the following signatures instead: addcmul_(Tensor tensor1, Tensor tensor2, *, Number value) (Triggered...

Should `F.softmax(Q_targets_next, dim=1)` be `F.softmax(Q_targets_next / entropy_tau, dim=1)` instead?