dm_control
dm_control copied to clipboard
Segmentation Fault with MuJoCo
I'm looking to benchmark both dm_control and MuJoCo.
This
import gym
from dm_control import suite
env = gym.make('HalfCheetah-v3')
yields
Traceback (most recent call last):
File "wrappers.pxi", line 1139, in mujoco_py.cymj.PyMjModel._extract_mj_names
AssertionError
Exception ignored in: 'mujoco_py.cymj.PyMjModel._set'
Traceback (most recent call last):
File "wrappers.pxi", line 1139, in mujoco_py.cymj.PyMjModel._extract_mj_names
AssertionError
Segmentation fault (core dumped)
whereas this exits gracefully:
import gym
env = gym.make('HalfCheetah-v3')
from dm_control import suite
env = gym.make('HalfCheetah-v3')
I'm using
dm-control 1.0.1
mujoco 2.1.3
mujoco-py 2.1.2.14
gym 0.21.0
While I'm happy to keep using this solution, it feels like a hack and a fix would be much appreciated!
Here's some gdb
debugging information, courtesy of some suggestions from @obilaniu.
The backtrace and
(gdb) x/i $pc
=> 0x7fffe827dea5 <_makeData.llvm.15600858514797960238+101>: mov (%r12),%esi
(gdb) p/x $r12
$1 = 0x0
indicated that the innermost frame was called with a NULL pointer.
Then, within disas:
#2 0x00007fffe3fcf428 in __pyx_pf_9mujoco_py_4cymj_5MjSim___cinit__ (
__pyx_v_render_callback=0x55555588e010 <_Py_NoneStruct>,
__pyx_v_userdata_names=0x55555588e010 <_Py_NoneStruct>,
__pyx_v_substep_callback=0x55555588e010 <_Py_NoneStruct>,
__pyx_v_udd_callback=0x55555588e010 <_Py_NoneStruct>, __pyx_v_nsubsteps=<optimised out>,
__pyx_v_data=<optimised out>, __pyx_v_model=<optimised out>, __pyx_v_self=0x7fffe36f7ef0)
at /home/dyth/miniconda3/envs/jaxrl/lib/python3.7/site-packages/mujoco_py/cymj.c:135770
.
Line /home/dyth/miniconda3/envs/jaxrl/lib/python3.7/site-packages/mujoco_py/cymj.c:135770
was
/* "mujoco_py/mjsim.pyx":78
* if _data == NULL:
* raise Exception('mj_makeData failed!')
* self.data = WrapMjData(_data, self.model) # <<<<<<<<<<<<<<
* else:
* self.data = data
*/
which prompted @obilaniu to suspect that the cause of segmentation was because self.model.ptr was null.
Might this have been overwritten by from dm_control import suite
?
Unfortunately this is an issue with mujoco-py
, a codebase that we neither own nor maintain, which means that we can't really provide support for it.
Moreover, the most recent version of dm_control
depends on MuJoCo's new official Python bindings which provides its own copy of the MuJoCo library itself. Since mujoco-py
is no longer maintained, it has not been updated to reflect changes in more recent versions of MuJoCo. I don't know whether this is the root cause of the segfault that you're experiencing, but generally speaking I would consider it unsafe to import both mujoco-py
and mujoco
(the latter is what dm_control
uses) into the same instance of Python.
Thanks, @saran-t, for your fast reply. Do you think it's possible that dm_control.suite
is somehow overwriting the mujoco-py
environment registrations during its import and wiping the pointer?
It shouldn't be possible for dm_control
to override anything in mujoco-py
since they're completely separate libraries.
If you want to dig into this further I'd suggest checking in gdb
(through info shared
) if the same libmujoco
is being loaded, comparing between your two code snippets.
I have a theory: in the segfaulting snippet you have import gym
immediately from dm_control import suite
. If you are using dm_control>=1.0.0
the latter would automatically load in the latest version of MuJoCo (through the mujoco
package). This isn't supported by mujoco-py
, which is what gym
currently relies on. The mujoco-py
package is no longer being updated or maintained, so it won't work with any versions of MuJoCo after 2.1.0.
In your second snippet, you created the gym
environment before importing dm_control
. This causes mujoco-py
to be loaded in first, which in turn brings in the older version of MuJoCo that it depends on. When you later import dm_control
it is unclear at this point which dynamic library the mujoco
Python bindings resolves its symbol from. I said unclear because I've never taken a deep enough dive into mujoco-py
to determine the mode that it uses to dlopen
the MuJoCo library.
In any case, having two libraries that try to load different dynamic libraries that export the same symbols isn't going to do any thing good. I strongly recommend keeping your gym
and dm_control
calls in separate Python interpreters, at least until https://github.com/openai/gym/pull/2595 is merged, and even then only if you're using their new v4
environments that depend on the new mujoco
bindings.
I also think the issue might also be with mujoco-py
. FWIW, there is a link to the OpenAI slack in this codebase that says import mujoco-py first.
To that end, another method of avoiding the segmentation fault is:
import mujoco_py
import gym
from dm_control import suite
env = gym.make('HalfCheetah-v3')