Static linking support for the MuJoCo library
This introduces changes to the CMake files to allow static linking. This is linked to #2618.
Update: It seems that starting from the main branch, libraries can be used with LTO without problems. Originally (3.3.3), I had to disable it for the dynamic library, which then resulted in slightly larger performance by just statically linking (+8%) and way larger performance when statically linking and LTO (+33%). This no longer seems to be the problem and the performance is not different.
Summary from #2618:
Benchmarks are steps per second (calls to mj_step) Results are 60-second mean +- standard err.
Prebuilt shared library
Prebuilt baseline: 116432.93 +- 151.63
Static linking (GCC):
102460.48 +- 44.18
Shared library (CLANG):
113133.28 +- 75.63
Shared library (GCC):
can't compile, not related to this PR.
Shared library (MSVC):
90465.95 +- 207.61
Other configuration variables: -DCMAKE_C_COMPILER:STRING=clang-14 # Only when using clang -DCMAKE_CXX_COMPILER:STRING=clang++-14 # Only when using clang -DMUJOCO_HARDEN:BOOL=ON # Only when using clang -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF -DMUJOCO_BUILD_EXAMPLES:BOOL=OFF
@saran-t
A comment on lodepng. Everything seems to build when building the shared MuJoCo library, however for the static MuJoCo library I had to add lodepng to the install. Not sure if this is the valid solution, please let me know if there is some more correct way of fixing it.
Is this a bug?
https://github.com/google-deepmind/mujoco/blob/35665c5cb33ffaea4caa16e6fa2d57f612d49fd6/cmake/MujocoOptions.cmake#L107
So it seems that there weren't any speedups at all compared to the prebuilt. When originally testing this I had to comment out some lines parts of the cmake files as it would otherwise not work and one of them was:
https://github.com/google-deepmind/mujoco/blob/35665c5cb33ffaea4caa16e6fa2d57f612d49fd6/cmake/MujocoOptions.cmake#L107
which basically disabled LTO. Not sure why it did't work in the past, but I can now do that no problem. The benchmarks now actually show it's slower 😅, but that's probably due to different compilers and their optimization methods.
However, static linking will still be beneficial for those that don't want to move the shared library around and want things as a single executable.
A comment on
lodepng. Everything seems to build when building the shared MuJoCo library, however for the static MuJoCo library I had to addlodepngto the install. Not sure if this is the valid solution, please let me know if there is some more correct way of fixing it.
What is the error if you did not added lodepng to the targets to install?
A comment on
lodepng. Everything seems to build when building the shared MuJoCo library, however for the static MuJoCo library I had to addlodepngto the install. Not sure if this is the valid solution, please let me know if there is some more correct way of fixing it.What is the error if you did not added
lodepngto the targets to install?
CMake Error: install(EXPORT "mujoco" ...) includes target "mujoco" which requires target "lodepng" that is not in any export set.
@traversaro So I'm currently trying to make a shared library using GCC. For some reason the library fails to link with the final program saying the symbols are missing (e. g., undefined reference to `mj_loadXML') and upon inspection with nm, I get this:
/.../mujoco/build/lib/libmujoco.so: plugin needed to handle lto object
00000000000037d9 b completed.0
w __cxa_finalize
0000000000001540 t deregister_tm_clones
00000000000015b0 t __do_global_dtors_aux
0000000000002630 d __do_global_dtors_aux_fini_array_entry
00000000000037b0 d __dso_handle
0000000000002640 d _DYNAMIC
000000000000152c t _fini
00000000000015f0 t frame_dummy
0000000000002638 d __frame_dummy_init_array_entry
000000000000050c r __FRAME_END__
00000000000037b8 d _GLOBAL_OFFSET_TABLE_
w __gmon_start__
00000000000037d8 B __gnu_lto_slim
0000000000001510 t _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
0000000000001570 t register_tm_clones
00000000000037b0 d __TMC_END__
00000000000037b0 d __TMC_LIST__
Do you by any chance know what needs to be done to make this work? It's clearly something to do with LTO but not sure what. It works fine when using CLANG.
I think your problem may have something to do with this file: include/mujoco/mjexport.h
Try defining MJ_STATIC when compiling and see if that fixes the issue.
For some reason the library fails to link with the final program saying the symbols are missing (e. g., undefined reference to `mj_loadXML')
Can you share the exact error and the gcc version you are using?
you share the exact error and the gcc version you are using?
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04)
note: /usr/bin/ld: simulation/target/release/examples/stepping2-1cbadef7d7a70dc1.stepping2.682662642ace0e14-cgu.0.rcgu.o: in function `stepping2::main':
stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x145): undefined reference to `mj_loadXML'
/usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x32a): undefined reference to `mj_makeData'
/usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x336): undefined reference to `mj_step'
/usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x10de): undefined reference to `mj_deleteData'
/usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x10e9): undefined reference to `mj_deleteModel'
Can you share also the execution arguments passed to ld, for example by compling with ninja -v or make VERBOSE=1 ? Is this happening only with this PR or also with stock MuJoCo?
Can you share also the execution arguments passed to
ld, for example by compling withninja -vormake VERBOSE=1? Is this happening only with this PR or also with stock MuJoCo?
It doesn't seem to be happening on the main branch. The ld command doesn't seem to be called explicitly
@traversaro
Actually sorry again, I accidentally only compiled the library without linking, it is still happening and this includes the main branch. The command:
/usr/bin/g++ -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -Wl,--no-as-needed -fuse-ld=lld -Wl,--gc-sections CMakeFiles/simulate.dir/main.cc.o -o ../bin/simulate -Wl,-rpath,"\$ORIGIN/../lib" ../lib/libsimulate.a ../lib/libglfw3.a ../lib/liblodepng.a ../lib/libmujoco.so.3.3.4 /usr/lib/x86_64-linux-gnu/librt.a -lm -ldl /usr/lib/x86_64-linux-gnu/libX11.so
It's like it can't read the symbols in the file due to LTO, but this only happens with gcc, clang works. I'm not sure if I need to have anything else installed on my system to make this work
@traversaro
Actually sorry again, I accidentally only compiled the library without linking, it is still happening and this includes the main branch. The command:
/usr/bin/g++ -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -Wl,--no-as-needed -fuse-ld=lld -Wl,--gc-sections CMakeFiles/simulate.dir/main.cc.o -o ../bin/simulate -Wl,-rpath,"\$ORIGIN/../lib" ../lib/libsimulate.a ../lib/libglfw3.a ../lib/liblodepng.a ../lib/libmujoco.so.3.3.4 /usr/lib/x86_64-linux-gnu/librt.a -lm -ldl /usr/lib/x86_64-linux-gnu/libX11.so
If that happens in the main branch, could it make sense to have a separate issue for it, ideally with the full command required to reproduce the errors, the exact mujoco commit and the distro you are using?
@traversaro Actually sorry again, I accidentally only compiled the library without linking, it is still happening and this includes the main branch. The command:
/usr/bin/g++ -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -Wl,--no-as-needed -fuse-ld=lld -Wl,--gc-sections CMakeFiles/simulate.dir/main.cc.o -o ../bin/simulate -Wl,-rpath,"\$ORIGIN/../lib" ../lib/libsimulate.a ../lib/libglfw3.a ../lib/liblodepng.a ../lib/libmujoco.so.3.3.4 /usr/lib/x86_64-linux-gnu/librt.a -lm -ldl /usr/lib/x86_64-linux-gnu/libX11.soIf that happens in the main branch, could it make sense to have a separate issue for it, ideally with the full command required to reproduce the errors, the exact mujoco commit and the distro you are using?
Sure, I'll open one.
Alright, I don't think there's anything from my side to be done on this. I've tested on windows and linux and it works. Someone has to test for Mac @traversaro .
LTO with static library is fairly unusual AFAICT. With LTO you emit compiler-specific bytecode which is only converted to actual machine code at link time, which only happens when you make the final binary or when you make a DSO.
Typically with LTO you wouldn't bother making a "static library" as such, you'd just compile everything to the IR "object files" temporarily and just link them straight to the final product.
@davidhozic Can you document your test environment?
I have built and tested on an M2 MacBook Pro and found the benchmark results to be mixed. Not all benchmarks were improved, many got slightly worse, typically a change less than 1%. A microbenchmark for RotVecQuat operations did show a significant improvement (5x), but the effects did not translate into major changes in the simulation step benchmarks.
LTO with static library is fairly unusual AFAICT. With LTO you emit compiler-specific bytecode which is only converted to actual machine code at link time, which only happens when you make the final binary or when you make a DSO.
Typically with LTO you wouldn't bother making a "static library" as such, you'd just compile everything to the IR "object files" temporarily and just link them straight to the final product.
The reason why I'm compiling to a static library is I'm doing a project in Rust, which then links to the static lib. It's easier to statically link than having to configure Cargo to work with CMake. It's thus also useful if anyone wishes to link MuJoCo in other C-compatible languages.
LTO with static library is fairly unusual AFAICT. With LTO you emit compiler-specific bytecode which is only converted to actual machine code at link time, which only happens when you make the final binary or when you make a DSO. Typically with LTO you wouldn't bother making a "static library" as such, you'd just compile everything to the IR "object files" temporarily and just link them straight to the final product.
The reason why I'm compiling to a static library is I'm doing a project in Rust, which then links to the static lib. It's easier to statically link than having to configure Cargo to work with CMake. It's thus also useful if anyone wishes to link MuJoCo in other C-compatible languages.
And it does seem like it still helps, regardless of whether I do LTO afterwards on the entire bin.
@davidhozic Can you document your test environment?
I have built and tested on an M2 MacBook Pro and found the benchmark results to be mixed. Not all benchmarks were improved, many got slightly worse, typically a change less than 1%. A microbenchmark for RotVecQuat operations did show a significant improvement (5x), but the effects did not translate into major changes in the simulation step benchmarks.
I originally tested these with LTO disabled (i.e., I edited the parts of the configuration where it is forcefully enabled in Release). The benchmarks were just done by simply measuring the number of steps in one second and then averaged over 60 samples (=60 seconds).
The actual results I get now are insignificant as shown is this PR (I assume you're referring to #2618.)
My CPU is R5 5600x. It was also my custom MJFC. OS: (K)Ubuntu 24.04.2 LTS for CLANG and GCC and Windows 10 for MSVC
@oursland When running the same benchmarks as you, I get fairly similar results. So this won't really be a performance boost, but at least it will be easier to link with external projects.
Any update on this? @traversaro