sheeprl icon indicating copy to clipboard operation
sheeprl copied to clipboard

Pure python training, evaluation and rollout documentation request.

Open redzhepdx opened this issue 1 year ago • 5 comments

Hi everyone,

As a professional who has worked with a few RL frameworks in the past, I can confidently say that this is one of the cleanest, most user-friendly, and advanced RL library I've encountered. In fact, I'm planning to introduce it to my team as our future RL framework, and we're excited to contribute to its development. I especially appreciate the Dreamer implementations and the informative blog posts – amazing work!

Based on my experience with RL framework development, I have a few recommendations that could make this library even more appealing to a wider range of engineers:

Pure Python Examples:

While I understand the value of Hydra as a tool for configuration management and rapid experimentation, it can be intimidating for newcomers. To address this barrier and encourage broader adoption, I recommend creating 3-4 pure Python documentation/tutorial examples demonstrating training, evaluation, and rollout using existing Lagos functionalities. This approach has been successful in attracting large-scale users to other RL libraries.

Here are some examples that might be helpful:

Similar to: https://github.com/araffin/rl-tutorial-jnrr19/blob/sb3/1_getting_started.ipynb A bit more complex (but valuable): https://github.com/araffin/rl-tutorial-jnrr19/blob/sb3/1_getting_started.ipynb Integrating with Isaac Gym: https://docs.omniverse.nvidia.com/isaacsim/latest/isaac_gym_tutorials/tutorial_advanced_rl_stable_baselines.html

Tips and Tricks:

As we all know, RL algorithms are sensitive to hyperparameters and often require specific techniques like action masking, observation normalization, and reward scaling to be successful on new environments. Given the library's advanced capabilities with World Models, sharing insights and best practices on these topics would be incredibly valuable to the community (including myself!). Here are some examples from other libraries:

https://stable-baselines.readthedocs.io/en/master/guide/rl_tips.html https://maze-rl.readthedocs.io/en/latest/best_practices_and_tutorials/tricks_of_the_trade.html

Transitioning to Hydra:

Once users become comfortable with the library's fundamentals, they'll naturally progress towards exploring scalability and advanced experimentation, which is where Hydra shines. Consider creating a separate tutorial or example notebook showcasing how to leverage Hydra and Sheep-RL's train and evaluate functionalities to achieve this transition smoothly.

I hope you find these recommendations helpful. Best of luck to the developers!

redzhepdx avatar Feb 17 '24 13:02 redzhepdx

Hi @redzhepdx! Thank you for the suggestions, really appreciated them! We can definitely have something similar to this and this: what do you think @michele-milesi? For the contribution we have to introduce a how to contribute.md, but if you want there is an old issue regarding the implementation of the DQN methods and their variants, if you want to start somewhere. Thank you

belerico avatar Feb 20 '24 07:02 belerico

Hi there, @belerico, yes, we can start with something similar to the two examples you mentioned. For the environment part, I think we can try to recycle this. Or are you thinking to use a more complex environment? (like this).

michele-milesi avatar Feb 20 '24 10:02 michele-milesi

Hi @michele-milesi , I believe the complexity of the environment matters little. You can use any environment but I would recommend something like crawler or any of mujoco or classical gym environments to show the capabilities of the framework on decently challenging cases so anyone can test it locally.

Thanks a lot for your prompt reaction to this topic.

redzhepdx avatar Feb 20 '24 12:02 redzhepdx

Is there any update on this? Would really appreciate a pure Python example to use for research, to better integrate my existing stable-baselines code with!

verityw avatar Mar 26 '24 03:03 verityw

Hi @verityw, we are fixing a few problems we found with half-precision training. After this, we will move on to pure python examples. Thank you for your patience.

michele-milesi avatar Mar 26 '24 08:03 michele-milesi