ohpc icon indicating copy to clipboard operation
ohpc copied to clipboard

[tau] Delete pre-built library that depends on Nvidia CG Toolkit

Open martin-g opened this issue 2 years ago • 4 comments

libjogl_cg.so comes pre-built in TAU sources. It depends on libCg.so and LibCgGL.so (provided by Nvidia CG_Toolkit) which are not available on the build systems.

Trying to install tau-gnu12-openmpi4-ohpc on openEuler 22.03 x86_64 fails with:

Error:
  Problem 1: package ohpc-gnu12-openmpi4-perf-tools-3.0-300.ohpc.4.1.x86_64 requires tau-gnu12-openmpi4-ohpc, but none of the providers can be installed
   - cannot install the best candidate for the job
   - nothing provides libCg.so()(64bit) needed by tau-gnu12-openmpi4-ohpc-2.31.1-300.ohpc.3.1.x86_64
   - nothing provides libCg.so(VERSION)(64bit) needed by tau-gnu12-openmpi4-ohpc-2.31.1-300.ohpc.3.1.x86_64
   - nothing provides libCgGL.so()(64bit) needed by tau-gnu12-openmpi4-ohpc-2.31.1-300.ohpc.3.1.x86_64
   - nothing provides libCgGL.so(VERSION)(64bit) needed by tau-gnu12-openmpi4-ohpc-2.31.1-300.ohpc.3.1.x86_64
$ ldd libjogl_cg.so
	linux-vdso.so.1 (0x00007fff64de4000)
	libGL.so.1 => /lib/x86_64-linux-gnu/libGL.so.1 (0x00007fe347579000)
	libX11.so.6 => /lib/x86_64-linux-gnu/libX11.so.6 (0x00007fe347439000)
	libCg.so => not found
	libCgGL.so => not found

During RPM build time the build systems use /usr/lib/rpm/elfdeps --requires libjogl_cg.so (via find-requires) to collect the runtime requirements. rpm-build 4.17+ lists the same dependencies as ldd does.

RHEL 9.x (tested with Rocky 9 and Almalinux 9) uses rpm 4.16 which does not list any dependencies. LEAP 15.5 uses rpm 4.14 which also does not list any dependencies. openEuler 22.03 uses rpm 4.17 and lists the same dependencies as ldd. Fedora 36 (rpm 4.17) and Fedora 38 (rpm 4.18) behave as openEuler.

This commit deletes all occurrences of libjogl_cg.so for all CPU architectures at %install, so that it is not used as a source while collecting the "Requires"

martin-g avatar Jul 18 '23 12:07 martin-g

After this fix:

root@euler-x8664 ~/g/ohpc (3.x)# dnf install /home/ohpc/rpmbuild/RPMS/x86_64/tau-gnu12-openmpi4-ohpc-2.31.1-19999.ci.ohpc.x86_64.rpm
Last metadata expiration check: 3:19:47 ago on Wed 19 Jul 2023 05:41:30 AM UTC.
Dependencies resolved.
=====================================================================================================================================================================================================================
 Package                                                    Architecture                              Version                                                  Repository                                       Size
=====================================================================================================================================================================================================================
Installing:
 tau-gnu12-openmpi4-ohpc                                    x86_64                                    2.31.1-19999.ci.ohpc                                     @commandline                                     28 M

Transaction Summary
=====================================================================================================================================================================================================================
Install  1 Package

Total size: 28 M
Installed size: 47 M
Is this ok [y/N]: y
Downloading Packages:
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                             1/1 
  Installing       : tau-gnu12-openmpi4-ohpc-2.31.1-19999.ci.ohpc.x86_64                                                                                                                                         1/1 
  Verifying        : tau-gnu12-openmpi4-ohpc-2.31.1-19999.ci.ohpc.x86_64                                                                                                                                         1/1 

Installed:
  tau-gnu12-openmpi4-ohpc-2.31.1-19999.ci.ohpc.x86_64                                                                                                                                                                

Complete!

martin-g avatar Jul 19 '23 09:07 martin-g

@adrianreber Do you have an idea how to make PAPI works in Github Actions:

not ok 2 [libs/TAU] MPI C++ binary runs under resource manager (slurm/gnu12/openmpi4)
2023-07-20T07:21:11.2739225Z # (from function `run_mpi_binary' in file ./common/functions, line 396,
2023-07-20T07:21:11.2739491Z #  in test file rm_execution, line 44)
2023-07-20T07:21:11.2739779Z #   `run_mpi_binary ./run_CXX_mpi_test.sh $ARGS $NODES $TASKS' failed
2023-07-20T07:21:11.2739912Z # job script = /tmp/job.ohpc.15580
2023-07-20T07:21:11.2740030Z # Batch job 7 submitted
2023-07-20T07:21:11.2740101Z #  
2023-07-20T07:21:11.2740202Z # Job 7 failed...
2023-07-20T07:21:11.2740320Z # Reason=NonZeroExitCode
2023-07-20T07:21:11.2740400Z #  
2023-07-20T07:21:11.2740549Z # [prun] Master compute host = d24e4dd96b92
2023-07-20T07:21:11.2740679Z # [prun] Resource manager = slurm
2023-07-20T07:21:11.2740884Z # [prun] Launch cmd = mpirun ./run_CXX_mpi_test.sh 8 (family=openmpi4)
2023-07-20T07:21:11.2741044Z # TAU: Error adding PAPI events: Event does not exist
2023-07-20T07:21:11.2741182Z # Got a bogus start! 0 .TAU application
2023-07-20T07:21:11.2741336Z # Still got a bogus start! 0 .TAU application
2023-07-20T07:21:11.2741506Z # TAU: Error adding PAPI events: Event does not exist
2023-07-20T07:21:11.2741645Z # Got a bogus start! 0 .TAU application
2023-07-20T07:21:11.2741797Z # Still got a bogus start! 0 .TAU application
2023-07-20T07:21:11.2742055Z # /opt/ohpc/pub/libs/gnu12/openmpi4/tau/2.31.1/bin/tau_exec: line 1454: 33364 Aborted                 (core dumped) $dryrun "$@"

martin-g avatar Jul 20 '23 07:07 martin-g

@adrianreber Do you have an idea how to make PAPI works in Github Actions:

No, I would not worry too much about it. Just disable that test for GitHub.

adrianreber avatar Jul 20 '23 07:07 adrianreber

@adrianreber The PR is ready to be reviewed! I've disabled the problematic TAU tests on SIMPLE_CI. They almost work with export TAU_METRICS=GET_TIME_OF_DAY but then the tests themselves use hardcoded values for NODES/TASKS/GRIDS/DIMS/... and I wasn't able to tune them to pass at Github Actions.

https://github.com/openhpc/ohpc/actions/runs/5619123661 is the fully green build before re-enabling gnu13 that breaks the RHEL jobs.

martin-g avatar Jul 21 '23 06:07 martin-g

@martin-g Do we still need this? Do you remember?

adrianreber avatar Mar 01 '24 08:03 adrianreber

Seems it was fixed with f53487b90ef88960aff3c6e66f7466dfb178fca3

adrianreber avatar Mar 01 '24 08:03 adrianreber

Indeed https://github.com/openhpc/ohpc/commit/f53487b90ef88960aff3c6e66f7466dfb178fca3 seems to fix the issue! I think the improvements in the tests are still useful though!

martin-g avatar Mar 01 '24 08:03 martin-g

Test Results

 4 files  ±0   4 suites  ±0   0s :stopwatch: ±0s 17 tests ±0  17 :white_check_mark: ±0  0 :zzz: ±0  0 :x: ±0  18 runs  ±0  18 :white_check_mark: ±0  0 :zzz: ±0  0 :x: ±0 

Results for commit 7e8e1034. ± Comparison against base commit b182411e.

github-actions[bot] avatar Mar 01 '24 09:03 github-actions[bot]

Merging this now to see if it breaks anything.

adrianreber avatar Mar 01 '24 11:03 adrianreber