Antonin RAFFIN
Antonin RAFFIN
> Unfortunately, the score doesn't rise above 300. Are you talking about the training reward (average over many episodes) or about the final performance using the (quasi)-deterministic policy? How many...
Hello, there is currently no plan for it. Would you volunteer for such feature?
before you do too much work, please open a PR as soon as you have a proof of concept, so we can agree on the details
> When training using a vectorized environment, what's the recommended approach for plotting the training progress? Which quantity do you want to plot? training reward? or test reward? To monitor...
Hello, if you add `gc.collect()` after `num_episodes[idx] += 1`, this should force garbage collection and solve your problem (tested locally).
@Mayankm96 failure on the CI seems to be due to a timeout, probably unrelated to this PR?
@Mayankm96 when do you think that you or someone from your team can have a look at this PR? and do you need anything else from me?
@Mayankm96 could you maybe assign someone else? Antonio seems pretty busy those last months.
Hello, I can report similar issues when using many processes at once with the journal storage.
> So i'm guessing that the environments are still on the cpu and their steps also take place on the CPU. yes > doesn't address the issue of CPU-GPU Data...