Alexander Nikulin comments

Results 47 comments of


                                            Alexander Nikulin

SAC jax

@araffin `1e-6` used on most popular SAC pytorch implementations, I also use it on my research for some reason (and in CORL). I think it's more a matter of reproducibility.

get action in sac_continuous_action.py

Usually in SAC we use Normal distribution coupled with tanh to bound action space. However, after such transformation the actual distribution is now not just standard Normal and we can...

Prioritized Experience Replay for DQN

@AlexPasqua even though I think it's very important, I'm unfortunately busy integrating [Minari](https://github.com/Farama-Foundation/Minari) to [CORL](https://github.com/tinkoff-ai/CORL) at the moment, so I'm unlikely to find the time to do it. But I'm...

[Question] How to build custom environment with custom action space

@JustinS6626 Actually, you can just mask the irrelevant actions during training, like it usually done in PPO. For example, there is an implementation of Maskable PPO in SB3: https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html

[BUG] AssertionError: MiniGrid-Empty-5x5-v0 is not supported

@devgonvarun hi! No, not looked into it further...

backflipping expert hyperparameters

Thanks! All arguments in BackflipCheetahEnv should be default? `forward_reward_weight=1.0` for example

Multi GPU support

We adapted PureJaxRL ppo+rnn implementation to the multi-gpu with pmap in [XLand-MiniGrid](https://github.com/corl-team/xland-minigrid) and it scales well (almost linear from 1 up to 8 A100 gpus)!

Multi GPU support

@luchris429 It just takes a bit more to compile in general (If I correctly understood time as number of total timesteps). I didn't notice any other performance dips for the...

Checkpointing and less regular metric collection

@jheagerty actually I think you can save checkpoints under jit easily with callbacks, such as [jax.experimental.io_callback()](https://jax.readthedocs.io/en/latest/_autosummary/jax.experimental.io_callback.html#jax.experimental.io_callback) (for example inside `_update_step` to save after the each update)

[Feature Request] Add Maze-like env

Hi @carlosluis! This is actually a very important suggestion and we plan to add procedural generation in some form sooner or later anyway. However, in our experience (and this is...