sheeprl icon indicating copy to clipboard operation
sheeprl copied to clipboard

DreamerV3: Hardware Resources Underutilized?

Open defrag-bambino opened this issue 9 months ago • 1 comments

Hi,

when I run (DreamerV3) experiments, especially ones with a replay_ratio > 1.0, training takes quite a long time. During these runs, my hardware resources are not being used much (e.g. only 1-2 cpu cores at around 50% each) - so clearly there is more computational power available. I was wondering if there is anything I can do to make SheepRL use more of the available hardware resources. I am already running multiple environments in parallel. I also tried increasing the num_threads, but this seems to have no effect.

Here is a simple example training command:

sheeprl fabric.accelerator=cuda fabric.strategy=ddp fabric.devices=1 fabric.precision=16-mixed exp=dreamer_v3 algo=dreamer_v3_S env=gym env.id=CartPole-v1 algo.total_steps=10000 algo.cnn_keys.encoder=\[\] algo.mlp_keys.encoder=\["vector"\] algo.cnn_keys.decoder=\[\] algo.mlp_keys.decoder=\["vector"\] env.num_envs=12 num_threads=16 checkpoint.every=1000 metric.log_every=100 algo.replay_ratio=10.0

Training this for up to aroung 8000 steps, where it reached the ~500 reward threshold, took around 3 hours. In the log data it lists a Time/sps_train of ~0.046 (which I assume is environment steps per second).

Thanks in advance for this great library!

defrag-bambino avatar May 15 '24 14:05 defrag-bambino

Hi @defrag-bambino, the slowdown when raising the replay-ratio is expected, as the higher the replay-ratio the more gradient steps are computed by the agent per policy-step. Since the training steps happens mainly in the GPU i would look at the GPU rather than the CPU (which is used mainly for saving experiences in the buffer and running a fairly simple env in this case) stats.

Furthermore I suggest you to not use the fabric.strategy=ddp when running on single device.

Another suggestion to speedup the training is to use this branch where we have introduced the compilation through torch.compile which should speedup your training on the right GPU.

If you try out that branch can you kindly report your findings in this issue?

Thank you

belerico avatar May 15 '24 21:05 belerico

Hi @defrag-bambino, have this fixed your issue? Are there any other consideration that you want to share?

belerico avatar Jun 25 '24 08:06 belerico

Yes, this is OK for now! Thanks

defrag-bambino avatar Jun 27 '24 08:06 defrag-bambino