Thorsten Kurth

Results 51 comments of Thorsten Kurth

Yes, you need to use some averaged reward over all the agents for the moment. Regarding the error, the interface is only implemented for a handful of dimension combinations, because...

Hello Sachin, I missed your last updates. Does everything work for you now? Also, we should support MARL via the multi-environment mechanism. In that case, you can specify n_env >...

Hello Sachin, good to hear it works for you and sorry for my silence, but we had a deadline which is now over. It is generally be possible to override...

in order for being able to specify a non scalar reward, you need to have n-envs > 1. So in your case, you would have something like ``` state: (features,...

Hello Sachin, the action should not be inf, but if it gets inf in between the clipping will not work anymore. With SAC, you need to make sure your action...

Concerning the loop over agents, you do not need that, just do a 1D convolution with kernel size = 1 and it will implicitly do a loop over agents, if...

And also, if you change the scaling from tanh to soft sign, then you also need to implement the log probs for that in the backend. The process is called...

For example, stable baselines has this here: https://github.com/DLR-RM/stable-baselines3/blob/6af0601dc3c91d5e5477f10ca6976e77a24cccc8/stable_baselines3/sac/policies.py#L172 and that routine needs to be implemented for different squashings. Torch has analytic functions for some but not all squashings.

Yes, so your action space is 11664 dimensional? That means that the log prob will likely explode. For methods which use log prob (like PPO and SAC) you need to...

All the shapes are mentioned are the ones in pytorch, but since fortran is column major, you need to transpose the stuff you are passing, so basically [B, C] would...