Thorsten Kurth
Thorsten Kurth
I think I understand where some of the confusion is coming from: the prediction and exploration routines expect a batch dimension, this is why they are called 2D_2D. This goes...
I agree that in practice, it would be nice to being able to simply feed a 1D array (so a single sample state) and implicitly it will produce a single...
Hello Sachin, multi does not refer to multi agent but instead multi environment. It allows you to run multiple environments in a single task, but those apply to the same...
If you want to share model information, then you should use the distributed system variant. In this case, you can use number of environments == number of agents and pass...
In this case, you can just use MPI to re-distribute the data beforehand, or, you can also split up the volume evenly so that each rank/GPU has the same number...
Hello Sachin, if you cannot influence the partitioning then you can use all-to-all or gather/scatter to re-distribute the data between the nodes. However, I would probably not invest too much...
It is also easier to debug in case something does not work as expected, since you only have to print replay/rollout buffer entries from that rank then and can dump...
Which algorithm is that, some on or off policy one? In both cases, a call to train step only samples a single batch from the buffer (replay or rollout), with...
Hello Sachin, this is part of the research I guess. However, I would refrain from using changing normalizations because especially with off policy algorithms, you want to be able to...
So, I never got SAC to work really well. The issue I am having with that is that the log prob diverges easily and the you see nan. I am...