aomp
aomp copied to clipboard
Use system's default rocgdb instead of AOMP's
rocgdb requires libpython.so which is more likely to be found by the system's default rocgdb.
The one in AOMP/bin/rocgdb complains about missing libpython.so file.
I don't think we want to test the sytem's rocgdb (by accident or on purpose).
I also wonder why we don’t want to test the rocgdb we built and packed ?
I am seeing a different complaint instead of a missing libpython:
[r6 ~]$ /COD/LATEST/aomp/bin/rocgdb
amd-dbgapi library version mismatch, got 0.70.1, need 0.71+
Seems to have started on:
[r6 ~]$ /COD/2023-12-20/aomp/bin/rocgdb
amd-dbgapi library version mismatch, got 0.70.1, need 0.71+
Works before that date:
[r6 ~]$ /COD/2023-12-19/aomp/bin/rocgdb
GNU gdb (AOMP_18.0-1) 13.2
...
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) q
We are actually staging python libs into COD to allow tools that are linked to specific versions of python shared objects to work. If you see a missing python lib error, paste in the exact error message and the system you saw it on.
We are actually staging python libs into COD to allow tools that are linked to specific versions of python shared objects to work. If you see a missing python lib error, paste in the exact error message and the system you saw it on.
I am getting same error irrespective of using 2024-03-07 build or 2023-12-04 build.
Note: results are on r11
/COD/2024-03-07/aomp/bin/clang++ -g -O0 -fopenmp --offload-arch=gfx90a -D__OFFLOAD_ARCH_gfx90a__ clang-325070.cpp -o clang-325070
/COD/2024-03-07/aomp/bin/rocgdb -x doit.gdb --args ./clang-325070 0 2>&1 | tee run.log
/COD/2024-03-07/aomp/bin/rocgdb: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory
make: *** [../Makefile.rules:71: run] Error 127
/COW/2023-12-04/aomp/bin/clang++ -g -O0 -fopenmp --offload-arch=gfx90a -D__OFFLOAD_ARCH_gfx90a__ clang-325070.cpp -o clang-325070
/COW/2023-12-04/aomp/bin/rocgdb -x doit.gdb --args ./clang-325070 0 2>&1 | tee run.log
/COW/2023-12-04/aomp/bin/rocgdb: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory
make: *** [../Makefile.rules:71: run] Error 127
Looking back at the original thread on the 'CI OpenMP compiler daily triage group' teams chat motivated the staging fix, you will need to do the following on a 22.04 system:
[r11 ~]$ PYTHONHOME=/COD/LATEST/aomp/lib/python3.8 PYTHONPATH=/COD/LATEST/aomp/lib/python3.8 LD_LIBRARY_PATH=/COD/LATEST/aomp/lib /COD/LATEST/aomp/bin/rocgdb
GNU gdb (AOMP_19.0-0) 13.2
...
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb)
Also note that setting the above env vars also fixes the 'amd-dbgapi library version mismatch, got 0.70.1, need 0.71+' error now seen on 20.04 systems.
Not a "fix" so much as a workaround for running rocgdb built on an older OS.
Not that it helps us in this situation, but the moral of the story is don't link your product with the python shared objects. There is just no backward compatibility guaranteed (at least not building on 20.04 and running on 22.04).
The 'amd-dbgapi library version mismatch, got 0.70.1, need 0.71+' error is even a problem on the same system where rocgdb was built. Without specifying LD_LIBRARY_PATH, it is picking up the library from the system /opt/rocm:
[r5 /COD/LATEST/aomp]$ ldd /COD/LATEST/aomp/bin/rocgdb | grep dbgapi
librocm-dbgapi.so.0 => /opt/rocm-5.7.0/lib/librocm-dbgapi.so.0 (0x00007f8046edb000)
Gets the staged librocm-dbgapi.so.0 with the workaround:
[r5 /COD/LATEST/aomp]$ PYTHONHOME=/COD/LATEST/aomp/lib/python3.8 PYTHONPATH=/COD/LATEST/aomp/lib/python3.8 LD_LIBRARY_PATH=/COD/LATEST/aomp/lib ldd /COD/LATEST/aomp/bin/rocgdb | grep dbgapi
librocm-dbgapi.so.0 => /COD/LATEST/aomp/lib/librocm-dbgapi.so.0 (0x00007f25342d1000)
This issue feels like a cmake bug in rocgdb, as it should try to pick up shared libraries relative to it's installed location first.