dm_control icon indicating copy to clipboard operation
dm_control copied to clipboard

Segmentation Fault with MuJoCo

Open dyth opened this issue 2 years ago • 6 comments

I'm looking to benchmark both dm_control and MuJoCo.

This

import gym
from dm_control import suite
env = gym.make('HalfCheetah-v3')

yields

Traceback (most recent call last):
  File "wrappers.pxi", line 1139, in mujoco_py.cymj.PyMjModel._extract_mj_names
AssertionError
Exception ignored in: 'mujoco_py.cymj.PyMjModel._set'
Traceback (most recent call last):
  File "wrappers.pxi", line 1139, in mujoco_py.cymj.PyMjModel._extract_mj_names
AssertionError
Segmentation fault (core dumped)

whereas this exits gracefully:

import gym
env = gym.make('HalfCheetah-v3')
from dm_control import suite
env = gym.make('HalfCheetah-v3')

I'm using

dm-control                   1.0.1
mujoco                       2.1.3
mujoco-py                    2.1.2.14
gym                          0.21.0

While I'm happy to keep using this solution, it feels like a hack and a fix would be much appreciated!

dyth avatar Apr 01 '22 20:04 dyth

Here's some gdb debugging information, courtesy of some suggestions from @obilaniu.

The backtrace and

(gdb) x/i $pc
=> 0x7fffe827dea5 <_makeData.llvm.15600858514797960238+101>:	mov    (%r12),%esi
(gdb) p/x $r12
$1 = 0x0

indicated that the innermost frame was called with a NULL pointer.

Then, within disas:

#2  0x00007fffe3fcf428 in __pyx_pf_9mujoco_py_4cymj_5MjSim___cinit__ (

    __pyx_v_render_callback=0x55555588e010 <_Py_NoneStruct>, 

    __pyx_v_userdata_names=0x55555588e010 <_Py_NoneStruct>, 

    __pyx_v_substep_callback=0x55555588e010 <_Py_NoneStruct>, 

    __pyx_v_udd_callback=0x55555588e010 <_Py_NoneStruct>, __pyx_v_nsubsteps=<optimised out>, 

    __pyx_v_data=<optimised out>, __pyx_v_model=<optimised out>, __pyx_v_self=0x7fffe36f7ef0)

    at /home/dyth/miniconda3/envs/jaxrl/lib/python3.7/site-packages/mujoco_py/cymj.c:135770

.

Line /home/dyth/miniconda3/envs/jaxrl/lib/python3.7/site-packages/mujoco_py/cymj.c:135770 was

    /* "mujoco_py/mjsim.pyx":78
 *             if _data == NULL:
 *                 raise Exception('mj_makeData failed!')
 *             self.data = WrapMjData(_data, self.model)             # <<<<<<<<<<<<<<
 *         else:
 *             self.data = data
 */

which prompted @obilaniu to suspect that the cause of segmentation was because self.model.ptr was null.

Might this have been overwritten by from dm_control import suite?

dyth avatar Apr 01 '22 22:04 dyth

Unfortunately this is an issue with mujoco-py, a codebase that we neither own nor maintain, which means that we can't really provide support for it.

Moreover, the most recent version of dm_control depends on MuJoCo's new official Python bindings which provides its own copy of the MuJoCo library itself. Since mujoco-py is no longer maintained, it has not been updated to reflect changes in more recent versions of MuJoCo. I don't know whether this is the root cause of the segfault that you're experiencing, but generally speaking I would consider it unsafe to import both mujoco-py and mujoco (the latter is what dm_control uses) into the same instance of Python.

saran-t avatar Apr 01 '22 22:04 saran-t

Thanks, @saran-t, for your fast reply. Do you think it's possible that dm_control.suite is somehow overwriting the mujoco-py environment registrations during its import and wiping the pointer?

dyth avatar Apr 02 '22 01:04 dyth

It shouldn't be possible for dm_control to override anything in mujoco-py since they're completely separate libraries.

If you want to dig into this further I'd suggest checking in gdb (through info shared) if the same libmujoco is being loaded, comparing between your two code snippets.

saran-t avatar Apr 08 '22 21:04 saran-t

I have a theory: in the segfaulting snippet you have import gym immediately from dm_control import suite. If you are using dm_control>=1.0.0 the latter would automatically load in the latest version of MuJoCo (through the mujoco package). This isn't supported by mujoco-py, which is what gym currently relies on. The mujoco-py package is no longer being updated or maintained, so it won't work with any versions of MuJoCo after 2.1.0.

In your second snippet, you created the gym environment before importing dm_control. This causes mujoco-py to be loaded in first, which in turn brings in the older version of MuJoCo that it depends on. When you later import dm_control it is unclear at this point which dynamic library the mujoco Python bindings resolves its symbol from. I said unclear because I've never taken a deep enough dive into mujoco-py to determine the mode that it uses to dlopen the MuJoCo library.

In any case, having two libraries that try to load different dynamic libraries that export the same symbols isn't going to do any thing good. I strongly recommend keeping your gym and dm_control calls in separate Python interpreters, at least until https://github.com/openai/gym/pull/2595 is merged, and even then only if you're using their new v4 environments that depend on the new mujoco bindings.

saran-t avatar Apr 09 '22 19:04 saran-t

I also think the issue might also be with mujoco-py. FWIW, there is a link to the OpenAI slack in this codebase that says import mujoco-py first.

To that end, another method of avoiding the segmentation fault is:

import mujoco_py
import gym
from dm_control import suite
env = gym.make('HalfCheetah-v3')

dyth avatar Apr 11 '22 17:04 dyth