rlberry
rlberry copied to clipboard
Actor Critic Agents are less sample efficient in general (?) since #290
@mmcenta , it seems that some changes in the model since #290 are making A2C and PPO worse on some benchmarks in particular @YannBerthelot 's probing environment tests. Let's discuss