omnitrace icon indicating copy to clipboard operation
omnitrace copied to clipboard

Allow using external elfutils

Open rbberger opened this issue 2 years ago • 6 comments

Tried installing v1.8.0 by adding it to Spack, but on our systems it always fails during the internal (ExternalProject) elfutils build and missing gettext symbols. While installing elfutils via Spack is easy, the current CMake setup doesn't seem to allow providing an external one to avoid this issue.

It should be straight forward to support this by following the same CMake logic found in dyninst.

rbberger avatar Mar 03 '23 00:03 rbberger

Follow up on that issue.

Here is a workaround for the original issue I encountered.

To make v1.8.0 compile with Spack and resolve the TBB include errors and the gettext linking errors:

  • Add the new version to the omnitrace spackage:
    version("1.8.0", commit="7c73d981258cc3a29477756a95c1f90c5f8897dd", submodules=True)
    
  • Add depends_on("[email protected]:") to the omnitrace spackage (copied from dyninst)
  • Spack install with an additional ldflag:
    spack install [email protected] ldflags="-lintl"
    

rbberger avatar Mar 07 '23 00:03 rbberger

Hi @rbberger, sorry for the delay, I was on vacation last week. It was always my intention to enable this but it got sort of lost in all the changes that arose from the causal profiling support. I can potentially get this feature into the repo in the 1.8.2 release -- if any of the solutions below solve your problem, I will probably semi-delay doing this until Dyninst merges the CMake overhaul I provided them when I reworked it to be build as a subproject. They just opened a PR into their main branch. Once that is merged I will be working on this anyway -- Dyninst decided to remove the support for internal builds of Boost/TBB/etc. so I am going to have to migrate that into OmniTrace.

Anyway, onto potential short-term solutions:

I poked around and it appears adding --disable-nls should discard the need to link to libintl. I cannot think of any need for it in omnitrace since we don't output messages in any language of than English anywhere else so I added that flag to #254. Thus, potentially, spack install omnitrace@main might fix your issue once that is merged.

Furthermore, following #254 getting merged, the RedHat installers should finally be available. Thus, there should plenty of install options available via the self-extracting installer scripts (Ubuntu 18.04 - 22.04, OpenSUSE 15.x, and RedHat 8.7, 9.0, and 9.1). These are quick and easy and 100% relocatable. Even before the 1.8.1 release (hopefully tomorrow), you can go to the Actions -> Installer Packaging (CPack) tab and find the latest installers for the last merge into the main branch.

Interesting this was necessary. Ideally, omnitrace should not have to add anything like this since Dyninst relies on TBB internally, OmniTrace does not. But then again, the CMake for your Dyninst installation was a literal nightmare so I am not surprised at all Dyninst isn't propagating the TBB include flags it needs for omnitrace to build against it.

jrmadsen avatar Mar 08 '23 04:03 jrmadsen

@jrmadsen thanks for the detailed reply. For now, the workaround I found should be good enough. Just a heads up on what I found along the way (beside a Spack bug https://github.com/spack/spack/issues/35913 😄 ). Please feel free to ignore this rambling, but maybe it's useful in case you run into this eventually.

Once I had the workaround (which I built for ROCm 5.3.0) I decided to upgrade to ROCm 5.4.3 (in Spack) to match what we have at the LLNL machines we are using. Things then got really weird because of this:

#include <hip/hip_runtime.h>
#include "hip_ostream_ops.h"
#include <hip/amd_detail/hip_prof_str.h>

https://github.com/ROCm-Developer-Tools/roctracer/blob/4c7c59eb652f6dd1f57fe0cceb2d82b03deba161/inc/roctracer_hip.h#L26-L28

The first include of <hip_runtime.h> then does this

#if USE_PROF_API
#include <hip/amd_detail/hip_prof_str.h>
#endif

https://github.com/ROCm-Developer-Tools/HIP/blob/fea9cf73ba64592cdbc6c946d03b2ad2f14b77db/include/hip/hip_runtime_api.h#L6968

because USE_PROF_API is set true from the installs CMake targets of HIP: https://github.com/ROCm-Developer-Tools/hipamd/blob/rocm-5.4.x/src/CMakeLists.txt#L258

Which then leads to undefined roctracer compilation errors during the include of <hip_runtime.h> because the required "hip_ostream_ops.h" only gets included after.

Interestingly, I did not see this on the LLNL machine (where I used the system ROCm 5.4.3 installation), so my guess is that it's somehow related to the way Spack installs HIP, how the files are laid out differently in that case, and/or how the CMake is configured by default there.

It might be related to https://github.com/ROCm-Developer-Tools/roctracer/commit/05ee3ff973fa9a9ad6620254bf39da4b1c87e72a which would mean you could/should change includes such as

https://github.com/AMDResearch/omnitrace/blob/40e0d2a92f5c8c8837fe28576d2645fef0ffaf94/source/lib/omnitrace/library/roctracer.hpp#L31

to

#include <roctracer/roctracer.h>

Anyway, I worked around it by

#define USE_PROF_API 0
#include <hip/hip_runtime.h>
#define USE_PROF_API 1
#include "hip_ostream_ops.h"
#include <hip/amd_detail/hip_prof_str.h>

rbberger avatar Mar 08 '23 17:03 rbberger

Interesting, and no worries, I enjoy a semi-related rambling.

Luckily, we won't run into this issue though. The include paths of <roctracer.h> vs. <roctracer/roctracer.h> are accounted for in the Findroctracer.cmake so that the former is effectively the same as the latter.

Plus the dependency on the HIP runtime is only as a last resort / only really used in omnitrace-avail and thus, there's only one place the header is included (lib/core/gpu.cpp), which is completely separated from any roctracer stuff. We basically avoid any calls to the HIP runtime since if OmniTrace were to make a call to the HIP runtime to say, query the number of GPU during initialization, it would activate the ROCm tooling even if the application is not using HIP (and thus produce superfluous output related to ROCm despite the app not using it).

jrmadsen avatar Mar 08 '23 18:03 jrmadsen

@rbberger Has your issue been resolved? If so, please close the ticket. Thanks!

ppanchad-amd avatar Oct 07 '24 18:10 ppanchad-amd

@ppanchad-amd I'll check, now that this has become part of the official ROCm distribution it might be less of an issue.

rbberger avatar Oct 07 '24 22:10 rbberger