Mixed double precision for PPO algorithm

Open lopatovsky opened this issue 1 year ago • 0 comments

Mixed precision

Motivation:

Inspired by RLGames, we implemented automatic mixed double precision to boost performance of PPO.

Sources:

https://pytorch.org/docs/stable/amp.html

Speed eval:


Library	Mixed-Precision	Time (s)	slowing factor Base: rlgames, mixed pr. = True
RLGames	No	448	1.322x
RLGames	Yes	339	1 (base)
SKRL	No	475	1.401x
SKRL	Yes	373	1.1x
SKRL	Yes *	358	1.056x

* in this run mixed precision was used also for inference during data collection phase

Quality eval:

We trained a policy for our task with each of the configurations multiple times. We didn’t observe any statistically significant difference in quality of the final results.

Jun 10 '24 16:06 lopatovsky