mujoco icon indicating copy to clipboard operation
mujoco copied to clipboard

will there be the support for GPU executing?

Open im-Kitsch opened this issue 3 years ago • 7 comments

Hi,

it's amazing that mujoco is open-source now. I wanna to ask that if there will be GPU support in the future? It would be pretty nice and exciting.

Thanks in advance.

im-Kitsch avatar Dec 01 '21 23:12 im-Kitsch

Using the GPU for what?
When I open an XML in the simulate program I see the GPU is using it.
Are you using Nvidia? Start a MuJoCo program and type nvidia-smi in any command line. Do you see the path to the MuJoCo process?

Screenshot 2021-12-02 170104

MotorCityCobra avatar Dec 02 '21 22:12 MotorCityCobra

I think what the OP means is that the simulations should happen on the GPU. Not sure if that's a priority for the development team.

aseembits93 avatar Dec 14 '21 16:12 aseembits93

Yeah, it will be awesome if the devs develop something like Nvidia's Isaac Gym

aditya-shirwatkar avatar Dec 17 '21 15:12 aditya-shirwatkar

Yeah, it will be awesome if the devs develop something like Nvidia's Isaac Gym

Our group has seen 20+ speed up by using Isaac Gym versus Mujoco for RL training. The main reason is the GPU acceleration and an implementation that minimizes CPU-GPU communcation. By pushing "everything" to the GPU the CPU-GPU communication goes away and the total system performance jumps. This is all enabled by Isaac Gym providing tensor interfaces to PhysX (which runs on the GPU).

If mujoco devs move mujoco to the GPU make sure you enable a tensor interface so pyTorch-based RL seemlessly communicates with the dynamics engine on the GPU and no CPU comms are needed.

joehays avatar May 06 '22 14:05 joehays

+1! I'd rather not switch to Isaac, but with the performance boost that Isaac promises, it's hard to say no

nik7273 avatar May 09 '22 17:05 nik7273

How are you running MuJoCo?

The MuJoCo C library is fast, but unfortunately, stepping environments through Python is indeed very slow. The way to get performance out of MuJoCo is to:

  1. Run many environments in parallel, to max out your CPU cores.
  2. Do as much of your run loop as possible in C/C++ rather than Python.
  3. If you can evaluate your policies on CPU in the same thread as stepping the physics, run many environments independently without batching.
  4. If you need to run your policy on an accelerator, make sure to batch and use a large number of environments.

Python does not let you do real parallelisation, because of the Global Interpreter Lock.

envpool is one project offering C++ implementations of common RL environments with a batched Python API. Their benchmark results show that MuJoCo performance is in the same ballpark as Isaac Gym.

We also posted some numbers, which you should be able to reproduce with the testspeed binary we released, in our blog post: https://www.deepmind.com/blog/open-sourcing-mujoco

Regarding efficiently running neural networks on CPU, I don’t have an open source project to point you at, but that’s the direction you should go towards for small networks (e.g. MLP with layer sizes 512, 256, 128).

nimrod-gileadi avatar May 26 '22 14:05 nimrod-gileadi

sliptime avatar Sep 06 '22 23:09 sliptime