Hector Kohler
Hector Kohler
There is a mistake in the computations of the advantages and returns. Indeed, the `_compute_returns_avantages()` function assumes that list `rewards` is of size `self.horizon` . However it is not always...
@mmcenta , it seems that some changes in the model since #290 are making A2C and PPO worse on some benchmarks in particular @YannBerthelot 's probing environment tests. Let's discuss
## Description PR for issues #325 #353 I remove some place holders in the user guide that we would probably never do and I did a user guide page for...
Rather than using ```rlberry.envs.gym_make``` let us use ```gymnasium.make``` Rather than using ```rlberry.agents.torch``` let us use [rlberry.agents.stable_baselines](https://rlberry-py.github.io/rlberry/generated/rlberry.agents.stable_baselines.StableBaselinesAgent.html#rlberry.agents.stable_baselines.StableBaselinesAgent) Same for atari_make and spaces.... basically if an rlberry class is just import a...