[Question] fps drops significantly over time
Question
I'm training a simple PPO agent in a mujoco gym env, however I've noticed that over the course of training the FPS goes down significantly. The fps keeps going down, even after the updates have started.
I'm wondering what could be the cause of this?
Additional context
- I'm training on
gpu, although running on cpu yields similar results - I'm training with the following parameters:
- n_envs: 1
- n_epochs: 10
- n_steps: 1024
- I've read DLR-RM/stable-baselines3#1597, but I don't feel like any of the comments there apply to my case.
- Dependency versions:
- Python 3.11
- gymnasium 1.0 (master branch)
- sbx 0.15
- sb3 2.3.2
The following charts show cpu usage and fps over time, where this particular run took 17h on a machine with:
- cpu: 13th Gen Intel i9-13900KF (32) @ 6.000GHz
- gpu: NVIDIA GeForce RTX 4090
Checklist
- [x] I have read the documentation (required)
- [x] I have checked that there is no similar issue in the repo (required)
what mujoco version do you use? SB3 is only fully compatible with 0.29.1 for now.
Do you have the same issue with built-env mujoco env? (for instance HalfCheetah-v4)
I haven't experienced that so far, might do some runs that I will log in open rl benchmark later.
Yes, it happens for the built in mujoco envs as well. I think I have narrowed it down to my use of SubprocVecEnv vs. DummyVecEnv. Where the SubprocVecEnv seems to leak memory or something which causes the memory on my machine to gradually get filled up.
I'm doing many runs on MuJoCo envs and I cannot see the effect you describe so far: https://wandb.ai/openrlbenchmark/sbx/runs/99wrpkc7?nw=nwuseraraffin (other runs are available in https://wandb.ai/openrlbenchmark/sbx/).
To reproduce, use the train script from the readme (using RL Zoo) and:
python train.py --algo ppo --env HalfCheetah-v4 --seed 3831217417 --eval-freq 25000 --verbose 0 --n-eval-envs 5 --eval-episodes 20 --log-interval 100 -c hyperparams/ppo.py -P --track
Note: I'm using JAX_PLATFORMS=cpu CUDA_VISIBLE_DEVICES=
the hyperparameter file is simply:
# Default hyperparameters for SB3 are tuned for MuJoCo
default_hyperparams = dict(
n_envs=1,
n_timesteps=int(1e6),
policy="MlpPolicy",
policy_kwargs={},
normalize=True,
)
hyperparams = {}
for env_id in [
"HalfCheetah-v4",
"Humanoid-v4",
"HalfCheetahBulletEnv-v0",
"Ant-v4",
"Hopper-v4",
"Walker2d-v4",
"Swimmer-v4",
]:
hyperparams[env_id] = default_hyperparams
EDIT: I'll try with multiple envs later
Note: when using multiple envs, you should probably adjust the n_steps to have a constant batch size
For instance:
JAX_PLATFORMS=cpu CUDA_VISIBLE_DEVICES= python train.py --algo ppo \
--env HalfCheetah-v4 -P -param n_envs:4 n_steps:512 --vec-env subproc \
--verbose 0 --eval-freq -1 -c hyperparams/ppo.py
When using 4 envs instead of 1 (the default n_steps is 2048, 2048/4 = 512)
It seems there is some sort of (memory?) leak when using SubProcEnv on Linux, switching to DummyVecEnv fixed the issue.