webots icon indicating copy to clipboard operation
webots copied to clipboard

increasing gpu memory over reset simulation for reinforcement learning in kubernetes contrainers

Open xinxing-max opened this issue 6 months ago • 0 comments

I’m running multiple Webots simulations in parallel inside Kubernetes pods, each under Fluxbox as the X server in headless (“batch”) mode. All pods have access to the same NVIDIA GPU(s) via the Kubernetes NVIDIA device plugin, and I launch Webots with --batch --mode=fast. Despite this, GPU memory usage keeps climbing over time and is never released—even after simulations reset or by using gc.collect(), or clear up the cuda cache after every learning process.

Environment:

Kubernetes pods

Fluxbox on Xorg as X server

Webots R2025a in headless mode (Start command: webots --stdout --stderr --batch --minimize --mode=fast worlds/rl_world.wbt)

Symptoms:

GPU memory usage increases steadily from 250 MB to 5 GB for every webots-bin process. It could be even higher if the GPU memory is sufficient.

The controller e.g. python script for reinforcement learning uses only 200 MB GPU memory.

Graphics Card: NVIDIA L4 24 GB Operating System: Ubuntu 22.04, 5.15.0-136-generic

xinxing-max avatar May 22 '25 12:05 xinxing-max