skrl icon indicating copy to clipboard operation
skrl copied to clipboard

Mixed double precision for PPO algorithm

Open lopatovsky opened this issue 1 year ago • 0 comments

Mixed precision

Motivation:

Inspired by RLGames, we implemented automatic mixed double precision to boost performance of PPO.

Sources:

https://pytorch.org/docs/stable/amp.html

https://pytorch.org/docs/stable/notes/amp_examples.html

Speed eval:

  • Big neural network (units: [2048, 1024, 1024, 512])

  • 10000 steps

  • Running on top of Oige env simulation (constant for each run)

  • Skrl uses single forward pass implementation

Library Mixed-Precision Time (s) slowing factor Base: rlgames, mixed pr. = True
RLGames No 448 1.322x
RLGames Yes 339 1 (base)
SKRL No 475 1.401x
SKRL Yes 373 1.1x
SKRL Yes * 358 1.056x

* in this run mixed precision was used also for inference during data collection phase

Quality eval:

  • We trained a policy for our task with each of the configurations multiple times. We didn’t observe any statistically significant difference in quality of the final results.

lopatovsky avatar Jun 10 '24 16:06 lopatovsky