ompi icon indicating copy to clipboard operation
ompi copied to clipboard

ODR violation with libopen-palcommon_sm

Open devreal opened this issue 1 month ago • 1 comments

I have built Open MPI with the address sanitizer enabled and get this error when launching the application:

=================================================================
==473823==ERROR: AddressSanitizer: odr-violation (0x7ffff156e4a0):
  [1] size=64 'mca_common_sm_module_t_class' ../../../../../opal/mca/common/sm/common_sm.c:43:1
  [2] size=64 'mca_common_sm_module_t_class' ../../../../../opal/mca/common/sm/common_sm.c:43:1
These globals were registered at these points:
  [1]:
    #0 0x7ffff7762b28 in __asan_register_globals ../../../../libsanitizer/asan/asan_globals.cpp:346
    #1 0x7ffff15603f4 in _sub_I_00099_1 (/gpfs/home/jschuchart/opt/ompi-big-datatypes/lib/openmpi/mca_btl_smcuda.so+0x203f4)
    #2 0x7ffff7fcc51d in call_init /usr/src/debug/glibc-2.34-168.el9_6.23.x86_64/elf/dl-init.c:70
    #3 0x7ffff7fcc51d in call_init /usr/src/debug/glibc-2.34-168.el9_6.23.x86_64/elf/dl-init.c:26

  [2]:
    #0 0x7ffff7762b28 in __asan_register_globals ../../../../libsanitizer/asan/asan_globals.cpp:346
    #1 0x7fffe1c1857f in _sub_I_00099_1 (/gpfs/home/jschuchart/opt/ompi-big-datatypes/lib/libopen-pal.so.0+0x25457f)
    #2 0x7ffff7fcc51d in call_init /usr/src/debug/glibc-2.34-168.el9_6.23.x86_64/elf/dl-init.c:70
    #3 0x7ffff7fcc51d in call_init /usr/src/debug/glibc-2.34-168.el9_6.23.x86_64/elf/dl-init.c:26

==473823==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'mca_common_sm_module_t_class' at ../../../../../opal/mca/common/sm/common_sm.c:43:1
==473823==ABORTING

It seems that libopen-palmca_common_sm_noinst.a (which contains mca_common_sm_module_t_class) is built statically and gets linked into both libopen-pal.so and mca_btl_smcuda.so, which leads to two instances of global variables with the same name being loaded.

In common/sm/Makefile.am I found this comment:

# Note that building this common component statically and linking
# against other dynamic components is *not* supported!

I think by building smcuda dynamically and linking common_sm statically we're violating that note. Maybe we should force common_sm to be built dynamically if mca_btl_smcuda.so is being built?

devreal avatar Oct 25 '25 16:10 devreal