webots
webots copied to clipboard
increasing gpu memory over reset simulation for reinforcement learning in kubernetes contrainers
I’m running multiple Webots simulations in parallel inside Kubernetes pods, each under Fluxbox as the X server in headless (“batch”) mode. All pods have access to the same NVIDIA GPU(s) via the Kubernetes NVIDIA device plugin, and I launch Webots with --batch --mode=fast. Despite this, GPU memory usage keeps climbing over time and is never released—even after simulations reset or by using gc.collect(), or clear up the cuda cache after every learning process.
Environment:
Kubernetes pods
Fluxbox on Xorg as X server
Webots R2025a in headless mode (Start command: webots --stdout --stderr --batch --minimize --mode=fast worlds/rl_world.wbt)
Symptoms:
GPU memory usage increases steadily from 250 MB to 5 GB for every webots-bin process. It could be even higher if the GPU memory is sufficient.
The controller e.g. python script for reinforcement learning uses only 200 MB GPU memory.
Graphics Card: NVIDIA L4 24 GB Operating System: Ubuntu 22.04, 5.15.0-136-generic