mujoco icon indicating copy to clipboard operation
mujoco copied to clipboard

Static linking support for the MuJoCo library

Open davidhozic opened this issue 7 months ago • 24 comments

This introduces changes to the CMake files to allow static linking. This is linked to #2618.

Update: It seems that starting from the main branch, libraries can be used with LTO without problems. Originally (3.3.3), I had to disable it for the dynamic library, which then resulted in slightly larger performance by just statically linking (+8%) and way larger performance when statically linking and LTO (+33%). This no longer seems to be the problem and the performance is not different.

Summary from #2618:

Benchmarks are steps per second (calls to mj_step) Results are 60-second mean +- standard err.

Prebuilt shared library

Prebuilt baseline: 116432.93 +- 151.63

Static linking (GCC):

102460.48 +- 44.18

Shared library (CLANG):

113133.28 +- 75.63

Shared library (GCC):

can't compile, not related to this PR.

Shared library (MSVC):

90465.95 +- 207.61

Other configuration variables: -DCMAKE_C_COMPILER:STRING=clang-14 # Only when using clang -DCMAKE_CXX_COMPILER:STRING=clang++-14 # Only when using clang -DMUJOCO_HARDEN:BOOL=ON # Only when using clang -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF -DMUJOCO_BUILD_EXAMPLES:BOOL=OFF

davidhozic avatar Jun 18 '25 20:06 davidhozic

@saran-t

davidhozic avatar Jun 18 '25 21:06 davidhozic

A comment on lodepng. Everything seems to build when building the shared MuJoCo library, however for the static MuJoCo library I had to add lodepng to the install. Not sure if this is the valid solution, please let me know if there is some more correct way of fixing it.

davidhozic avatar Jun 18 '25 22:06 davidhozic

Is this a bug?

https://github.com/google-deepmind/mujoco/blob/35665c5cb33ffaea4caa16e6fa2d57f612d49fd6/cmake/MujocoOptions.cmake#L107

davidhozic avatar Jun 19 '25 02:06 davidhozic

So it seems that there weren't any speedups at all compared to the prebuilt. When originally testing this I had to comment out some lines parts of the cmake files as it would otherwise not work and one of them was:

https://github.com/google-deepmind/mujoco/blob/35665c5cb33ffaea4caa16e6fa2d57f612d49fd6/cmake/MujocoOptions.cmake#L107

which basically disabled LTO. Not sure why it did't work in the past, but I can now do that no problem. The benchmarks now actually show it's slower 😅, but that's probably due to different compilers and their optimization methods.

However, static linking will still be beneficial for those that don't want to move the shared library around and want things as a single executable.

davidhozic avatar Jun 19 '25 03:06 davidhozic

A comment on lodepng. Everything seems to build when building the shared MuJoCo library, however for the static MuJoCo library I had to add lodepng to the install. Not sure if this is the valid solution, please let me know if there is some more correct way of fixing it.

What is the error if you did not added lodepng to the targets to install?

traversaro avatar Jun 19 '25 11:06 traversaro

A comment on lodepng. Everything seems to build when building the shared MuJoCo library, however for the static MuJoCo library I had to add lodepng to the install. Not sure if this is the valid solution, please let me know if there is some more correct way of fixing it.

What is the error if you did not added lodepng to the targets to install?

CMake Error: install(EXPORT "mujoco" ...) includes target "mujoco" which requires target "lodepng" that is not in any export set.

davidhozic avatar Jun 19 '25 11:06 davidhozic

@traversaro So I'm currently trying to make a shared library using GCC. For some reason the library fails to link with the final program saying the symbols are missing (e. g., undefined reference to `mj_loadXML') and upon inspection with nm, I get this:

/.../mujoco/build/lib/libmujoco.so: plugin needed to handle lto object
00000000000037d9 b completed.0
                 w __cxa_finalize
0000000000001540 t deregister_tm_clones
00000000000015b0 t __do_global_dtors_aux
0000000000002630 d __do_global_dtors_aux_fini_array_entry
00000000000037b0 d __dso_handle
0000000000002640 d _DYNAMIC
000000000000152c t _fini
00000000000015f0 t frame_dummy
0000000000002638 d __frame_dummy_init_array_entry
000000000000050c r __FRAME_END__
00000000000037b8 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
00000000000037d8 B __gnu_lto_slim
0000000000001510 t _init
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
0000000000001570 t register_tm_clones
00000000000037b0 d __TMC_END__
00000000000037b0 d __TMC_LIST__

Do you by any chance know what needs to be done to make this work? It's clearly something to do with LTO but not sure what. It works fine when using CLANG.

davidhozic avatar Jun 20 '25 06:06 davidhozic

I think your problem may have something to do with this file: include/mujoco/mjexport.h

Try defining MJ_STATIC when compiling and see if that fixes the issue.

oursland avatar Jun 20 '25 07:06 oursland

For some reason the library fails to link with the final program saying the symbols are missing (e. g., undefined reference to `mj_loadXML')

Can you share the exact error and the gcc version you are using?

traversaro avatar Jun 20 '25 07:06 traversaro

you share the exact error and the gcc version you are using?

Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04)
 note: /usr/bin/ld: simulation/target/release/examples/stepping2-1cbadef7d7a70dc1.stepping2.682662642ace0e14-cgu.0.rcgu.o: in function `stepping2::main':
          stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x145): undefined reference to `mj_loadXML'
          /usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x32a): undefined reference to `mj_makeData'
          /usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x336): undefined reference to `mj_step'
          /usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x10de): undefined reference to `mj_deleteData'
          /usr/bin/ld: stepping2.682662642ace0e14-cgu.0:(.text._ZN9stepping24main17h94d4afc2a5c5b61fE+0x10e9): undefined reference to `mj_deleteModel'

davidhozic avatar Jun 20 '25 07:06 davidhozic

Can you share also the execution arguments passed to ld, for example by compling with ninja -v or make VERBOSE=1 ? Is this happening only with this PR or also with stock MuJoCo?

traversaro avatar Jun 20 '25 07:06 traversaro

Can you share also the execution arguments passed to ld, for example by compling with ninja -v or make VERBOSE=1 ? Is this happening only with this PR or also with stock MuJoCo?

It doesn't seem to be happening on the main branch. The ld command doesn't seem to be called explicitly

davidhozic avatar Jun 20 '25 07:06 davidhozic

@traversaro

Actually sorry again, I accidentally only compiled the library without linking, it is still happening and this includes the main branch. The command:

/usr/bin/g++ -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -Wl,--no-as-needed -fuse-ld=lld -Wl,--gc-sections CMakeFiles/simulate.dir/main.cc.o -o ../bin/simulate -Wl,-rpath,"\$ORIGIN/../lib" ../lib/libsimulate.a ../lib/libglfw3.a ../lib/liblodepng.a ../lib/libmujoco.so.3.3.4 /usr/lib/x86_64-linux-gnu/librt.a -lm -ldl /usr/lib/x86_64-linux-gnu/libX11.so

davidhozic avatar Jun 20 '25 08:06 davidhozic

It's like it can't read the symbols in the file due to LTO, but this only happens with gcc, clang works. I'm not sure if I need to have anything else installed on my system to make this work

davidhozic avatar Jun 20 '25 08:06 davidhozic

@traversaro

Actually sorry again, I accidentally only compiled the library without linking, it is still happening and this includes the main branch. The command:

/usr/bin/g++ -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -Wl,--no-as-needed -fuse-ld=lld -Wl,--gc-sections CMakeFiles/simulate.dir/main.cc.o -o ../bin/simulate -Wl,-rpath,"\$ORIGIN/../lib" ../lib/libsimulate.a ../lib/libglfw3.a ../lib/liblodepng.a ../lib/libmujoco.so.3.3.4 /usr/lib/x86_64-linux-gnu/librt.a -lm -ldl /usr/lib/x86_64-linux-gnu/libX11.so

If that happens in the main branch, could it make sense to have a separate issue for it, ideally with the full command required to reproduce the errors, the exact mujoco commit and the distro you are using?

traversaro avatar Jun 20 '25 08:06 traversaro

@traversaro Actually sorry again, I accidentally only compiled the library without linking, it is still happening and this includes the main branch. The command: /usr/bin/g++ -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects -Wl,--no-as-needed -fuse-ld=lld -Wl,--gc-sections CMakeFiles/simulate.dir/main.cc.o -o ../bin/simulate -Wl,-rpath,"\$ORIGIN/../lib" ../lib/libsimulate.a ../lib/libglfw3.a ../lib/liblodepng.a ../lib/libmujoco.so.3.3.4 /usr/lib/x86_64-linux-gnu/librt.a -lm -ldl /usr/lib/x86_64-linux-gnu/libX11.so

If that happens in the main branch, could it make sense to have a separate issue for it, ideally with the full command required to reproduce the errors, the exact mujoco commit and the distro you are using?

Sure, I'll open one.

davidhozic avatar Jun 20 '25 08:06 davidhozic

Alright, I don't think there's anything from my side to be done on this. I've tested on windows and linux and it works. Someone has to test for Mac @traversaro .

davidhozic avatar Jun 20 '25 11:06 davidhozic

LTO with static library is fairly unusual AFAICT. With LTO you emit compiler-specific bytecode which is only converted to actual machine code at link time, which only happens when you make the final binary or when you make a DSO.

Typically with LTO you wouldn't bother making a "static library" as such, you'd just compile everything to the IR "object files" temporarily and just link them straight to the final product.

saran-t avatar Jun 24 '25 17:06 saran-t

@davidhozic Can you document your test environment?

I have built and tested on an M2 MacBook Pro and found the benchmark results to be mixed. Not all benchmarks were improved, many got slightly worse, typically a change less than 1%. A microbenchmark for RotVecQuat operations did show a significant improvement (5x), but the effects did not translate into major changes in the simulation step benchmarks.

LTO-static-benchmark.txt

oursland avatar Jun 24 '25 17:06 oursland

LTO with static library is fairly unusual AFAICT. With LTO you emit compiler-specific bytecode which is only converted to actual machine code at link time, which only happens when you make the final binary or when you make a DSO.

Typically with LTO you wouldn't bother making a "static library" as such, you'd just compile everything to the IR "object files" temporarily and just link them straight to the final product.

The reason why I'm compiling to a static library is I'm doing a project in Rust, which then links to the static lib. It's easier to statically link than having to configure Cargo to work with CMake. It's thus also useful if anyone wishes to link MuJoCo in other C-compatible languages.

davidhozic avatar Jun 24 '25 18:06 davidhozic

LTO with static library is fairly unusual AFAICT. With LTO you emit compiler-specific bytecode which is only converted to actual machine code at link time, which only happens when you make the final binary or when you make a DSO. Typically with LTO you wouldn't bother making a "static library" as such, you'd just compile everything to the IR "object files" temporarily and just link them straight to the final product.

The reason why I'm compiling to a static library is I'm doing a project in Rust, which then links to the static lib. It's easier to statically link than having to configure Cargo to work with CMake. It's thus also useful if anyone wishes to link MuJoCo in other C-compatible languages.

And it does seem like it still helps, regardless of whether I do LTO afterwards on the entire bin.

davidhozic avatar Jun 24 '25 18:06 davidhozic

@davidhozic Can you document your test environment?

I have built and tested on an M2 MacBook Pro and found the benchmark results to be mixed. Not all benchmarks were improved, many got slightly worse, typically a change less than 1%. A microbenchmark for RotVecQuat operations did show a significant improvement (5x), but the effects did not translate into major changes in the simulation step benchmarks.

LTO-static-benchmark.txt

I originally tested these with LTO disabled (i.e., I edited the parts of the configuration where it is forcefully enabled in Release). The benchmarks were just done by simply measuring the number of steps in one second and then averaged over 60 samples (=60 seconds).

The actual results I get now are insignificant as shown is this PR (I assume you're referring to #2618.)

My CPU is R5 5600x. It was also my custom MJFC. OS: (K)Ubuntu 24.04.2 LTS for CLANG and GCC and Windows 10 for MSVC

davidhozic avatar Jun 24 '25 18:06 davidhozic

@oursland When running the same benchmarks as you, I get fairly similar results. So this won't really be a performance boost, but at least it will be easier to link with external projects.

static.txt shared.txt

davidhozic avatar Jun 24 '25 19:06 davidhozic

Any update on this? @traversaro

davidhozic avatar Aug 30 '25 11:08 davidhozic