Alexander Nikulin
Alexander Nikulin
@araffin `1e-6` used on most popular SAC pytorch implementations, I also use it on my research for some reason (and in CORL). I think it's more a matter of reproducibility.
Usually in SAC we use Normal distribution coupled with tanh to bound action space. However, after such transformation the actual distribution is now not just standard Normal and we can...
@AlexPasqua even though I think it's very important, I'm unfortunately busy integrating [Minari](https://github.com/Farama-Foundation/Minari) to [CORL](https://github.com/tinkoff-ai/CORL) at the moment, so I'm unlikely to find the time to do it. But I'm...
@JustinS6626 Actually, you can just mask the irrelevant actions during training, like it usually done in PPO. For example, there is an implementation of Maskable PPO in SB3: https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html
@devgonvarun hi! No, not looked into it further...
Thanks! All arguments in BackflipCheetahEnv should be default? `forward_reward_weight=1.0` for example
We adapted PureJaxRL ppo+rnn implementation to the multi-gpu with pmap in [XLand-MiniGrid](https://github.com/corl-team/xland-minigrid) and it scales well (almost linear from 1 up to 8 A100 gpus)!
@luchris429 It just takes a bit more to compile in general (If I correctly understood time as number of total timesteps). I didn't notice any other performance dips for the...
@jheagerty actually I think you can save checkpoints under jit easily with callbacks, such as [jax.experimental.io_callback()](https://jax.readthedocs.io/en/latest/_autosummary/jax.experimental.io_callback.html#jax.experimental.io_callback) (for example inside `_update_step` to save after the each update)
Hi @carlosluis! This is actually a very important suggestion and we plan to add procedural generation in some form sooner or later anyway. However, in our experience (and this is...