skrl
skrl copied to clipboard
Modular Reinforcement Learning (RL) library (implemented in PyTorch, JAX, and NVIDIA Warp) with support for Gymnasium/Gym, NVIDIA Isaac Lab, Brax and other environments
Previous, the sampling failed if the action space is infinite. While an infinite action space might be bad practise, in some cases it is not easy to get rid of...
Hi @Toni-SM , Are there examples of using multiple Inputs observations? For example, one input is an image and another input is a vector. Something similar in stablebaseline3 here https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html#multiple-inputs-and-dictionary-observations
# Add Model-Based Meta-Policy-Optimization (MBMPO) ## Introduction and description Coming soon ## Improvements in this PR Coming soon ## Proof of Work Coming soon Cheers, Johann
### Description I am using the latest [orbit](https://github.com/NVIDIA-Omniverse/orbit/tree/83e14f096ed3b20223cdca3065975bcc7dfa22f1) with skrl1.1.0. And I am trying to run example codes provide in you docs(like [torch_ant_ppo.py](https://skrl.readthedocs.io/en/latest/_downloads/3faa6f6c7e33a77373e38111c8999c22/torch_ant_ppo.py) ), But I got No module named...
This pull request addresses a discrepancy between the original TD3 and DDPG paper's algorithm and the current implementation in the repository. Specifically, the original implementation performs the sampling step outside...
### Discussed in https://github.com/Toni-SM/skrl/discussions/70 Originally posted by **403forbiddennn** April 19, 2023 In the **Isaac Gym wrapper** class, the `render` method is inappropriately overridden by your wrapper and thus can not...
### Description Hi there, I have been using `skrl` with OIGE, but when I try the "Getting Started" code for `dm_control` : ``` # import the environment wrapper and the...
### Description Random actions are done by taking the low and high values of the first dimension on the action space a,d then uniformly sampling from [low, high] for each...
### Description The mean rewards are computed by adding the mean of all stored cumulative rewards to the self.tracking_data dictionary `self.tracking_data["Reward / Total reward (mean)"].append(np.mean(track_rewards))`. Then every time the data...
# Mixed precision **Motivation**: Inspired by RLGames, we implemented automatic mixed double precision to boost performance of PPO. **Sources:** **Speed eval:** - Big neural network (units: \[2048, 1024, 1024, 512])...