sbx [Question] fps drops significantly over time

Question

I'm training a simple PPO agent in a mujoco gym env, however I've noticed that over the course of training the FPS goes down significantly. The fps keeps going down, even after the updates have started.

I'm wondering what could be the cause of this?

Additional context

I'm training on gpu, although running on cpu yields similar results
I'm training with the following parameters:
- n_envs: 1
- n_epochs: 10
- n_steps: 1024
I've read DLR-RM/stable-baselines3#1597, but I don't feel like any of the comments there apply to my case.
Dependency versions:
- Python 3.11
- gymnasium 1.0 (master branch)
- sbx 0.15
- sb3 2.3.2

The following charts show cpu usage and fps over time, where this particular run took 17h on a machine with:

cpu: 13th Gen Intel i9-13900KF (32) @ 6.000GHz
gpu: NVIDIA GeForce RTX 4090

Checklist

[x] I have read the documentation (required)
[x] I have checked that there is no similar issue in the repo (required)

Sep 23 '24 13:09 oxkitsune

what mujoco version do you use? SB3 is only fully compatible with 0.29.1 for now.

Do you have the same issue with built-env mujoco env? (for instance HalfCheetah-v4)

I haven't experienced that so far, might do some runs that I will log in open rl benchmark later.

Oct 07 '24 14:10 araffin

Yes, it happens for the built in mujoco envs as well. I think I have narrowed it down to my use of SubprocVecEnv vs. DummyVecEnv. Where the SubprocVecEnv seems to leak memory or something which causes the memory on my machine to gradually get filled up.

Oct 26 '24 13:10 oxkitsune

I'm doing many runs on MuJoCo envs and I cannot see the effect you describe so far: https://wandb.ai/openrlbenchmark/sbx/runs/99wrpkc7?nw=nwuseraraffin (other runs are available in https://wandb.ai/openrlbenchmark/sbx/).

To reproduce, use the train script from the readme (using RL Zoo) and:

python train.py --algo ppo --env HalfCheetah-v4 --seed 3831217417 --eval-freq 25000 --verbose 0 --n-eval-envs 5 --eval-episodes 20 --log-interval 100 -c hyperparams/ppo.py -P --track

Note: I'm using JAX_PLATFORMS=cpu CUDA_VISIBLE_DEVICES=

the hyperparameter file is simply:

# Default hyperparameters for SB3 are tuned for MuJoCo
default_hyperparams = dict(
    n_envs=1,
    n_timesteps=int(1e6),
    policy="MlpPolicy",
    policy_kwargs={},
    normalize=True,
)

hyperparams = {}

for env_id in [
    "HalfCheetah-v4",
    "Humanoid-v4",
    "HalfCheetahBulletEnv-v0",
    "Ant-v4",
    "Hopper-v4",
    "Walker2d-v4",
    "Swimmer-v4",
]:
    hyperparams[env_id] = default_hyperparams

EDIT: I'll try with multiple envs later

Oct 30 '24 08:10 araffin

Note: when using multiple envs, you should probably adjust the n_steps to have a constant batch size

For instance:

JAX_PLATFORMS=cpu CUDA_VISIBLE_DEVICES= python train.py --algo ppo \
--env HalfCheetah-v4 -P -param n_envs:4 n_steps:512 --vec-env subproc \
--verbose 0 --eval-freq -1 -c hyperparams/ppo.py

When using 4 envs instead of 1 (the default n_steps is 2048, 2048/4 = 512)

Oct 30 '24 08:10 araffin

It seems there is some sort of (memory?) leak when using SubProcEnv on Linux, switching to DummyVecEnv fixed the issue.

Nov 21 '24 01:11 oxkitsune