skrl
skrl copied to clipboard
Mixed double precision for PPO algorithm
Mixed precision
Motivation:
Inspired by RLGames, we implemented automatic mixed double precision to boost performance of PPO.
Sources:
https://pytorch.org/docs/stable/amp.html
https://pytorch.org/docs/stable/notes/amp_examples.html
Speed eval:
-
Big neural network (units: [2048, 1024, 1024, 512])
-
10000 steps
-
Running on top of Oige env simulation (constant for each run)
-
Skrl uses single forward pass implementation
| Library | Mixed-Precision | Time (s) | slowing factor Base: rlgames, mixed pr. = True |
| RLGames | No | 448 | 1.322x |
| RLGames | Yes | 339 | 1 (base) |
| SKRL | No | 475 | 1.401x |
| SKRL | Yes | 373 | 1.1x |
| SKRL | Yes * | 358 | 1.056x |
* in this run mixed precision was used also for inference during data collection phase
Quality eval:
- We trained a policy for our task with each of the configurations multiple times. We didn’t observe any statistically significant difference in quality of the final results.