Thorsten Kurth comments

Results 51 comments of


                                            Thorsten Kurth

Working with RL

Yes, you need to use some averaged reward over all the agents for the moment. Regarding the error, the interface is only implemented for a handful of dimension combinations, because...

Working with RL

Hello Sachin, I missed your last updates. Does everything work for you now? Also, we should support MARL via the multi-environment mechanism. In that case, you can specify n_env >...

Working with RL

Hello Sachin, good to hear it works for you and sorry for my silence, but we had a deadline which is now over. It is generally be possible to override...

Working with RL

in order for being able to specify a non scalar reward, you need to have n-envs > 1. So in your case, you would have something like ``` state: (features,...

Working with RL

Hello Sachin, the action should not be inf, but if it gets inf in between the clipping will not work anymore. With SAC, you need to make sure your action...

Working with RL

Concerning the loop over agents, you do not need that, just do a 1D convolution with kernel size = 1 and it will implicitly do a loop over agents, if...

Working with RL

And also, if you change the scaling from tanh to soft sign, then you also need to implement the log probs for that in the backend. The process is called...

For example, stable baselines has this here: https://github.com/DLR-RM/stable-baselines3/blob/6af0601dc3c91d5e5477f10ca6976e77a24cccc8/stable_baselines3/sac/policies.py#L172 and that routine needs to be implemented for different squashings. Torch has analytic functions for some but not all squashings.

Working with RL

Yes, so your action space is 11664 dimensional? That means that the log prob will likely explode. For methods which use log prob (like PPO and SAC) you need to...

Working with RL

All the shapes are mentioned are the ones in pytorch, but since fortran is column major, you need to transpose the stuff you are passing, so basically [B, C] would...