Thorsten Kurth comments

Results 51 comments of


                                            Thorsten Kurth

Why 4D states in RL interface?

I think I understand where some of the confusion is coming from: the prediction and exploration routines expect a batch dimension, this is why they are called 2D_2D. This goes...

Why 4D states in RL interface?

I agree that in practice, it would be nice to being able to simply feed a 1D array (so a single sample state) and implicitly it will produce a single...

Running simulations in parallel

Hello Sachin, multi does not refer to multi agent but instead multi environment. It allows you to run multiple environments in a single task, but those apply to the same...

Running simulations in parallel

If you want to share model information, then you should use the distributed system variant. In this case, you can use number of environments == number of agents and pass...

Running simulations in parallel

In this case, you can just use MPI to re-distribute the data beforehand, or, you can also split up the volume evenly so that each rank/GPU has the same number...

Running simulations in parallel

Hello Sachin, if you cannot influence the partitioning then you can use all-to-all or gather/scatter to re-distribute the data between the nodes. However, I would probably not invest too much...

Running simulations in parallel

It is also easier to debug in case something does not work as expected, since you only have to print replay/rollout buffer entries from that rank then and can dump...

Running simulations in parallel

Which algorithm is that, some on or off policy one? In both cases, a call to train step only samples a single batch from the buffer (replay or rollout), with...

Running simulations in parallel

Hello Sachin, this is part of the research I guess. However, I would refrain from using changing normalizations because especially with off policy algorithms, you want to be able to...

Running simulations in parallel

So, I never got SAC to work really well. The issue I am having with that is that the log prob diverges easily and the you see nan. I am...