rccl icon indicating copy to clipboard operation
rccl copied to clipboard

Adding ROCm Debug Agent support

Open gilbertlee-amd opened this issue 1 year ago • 3 comments

Details

Do not mention proprietary info or link to internal work items in this PR.

Work item: "Internal", or link to GitHub issue (if applicable).

What were the changes?
Adding CMakefile / install.sh support for enabling ROCm Debug Agent support

Why were the changes made?
Make it easier to compile with the flags to support ROCm Debug Agent

How was the outcome achieved?
Modified the CMakefile and install script to take in a new "--debug-agent" argument

Additional Documentation:
Tested with ./install.sh --debug-agent and confirmed that "-O0 -ggdb" was being added to compilation flags

Approval Checklist

Do not approve until these items are satisfied.

  • [ ] Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

gilbertlee-amd avatar May 10 '24 16:05 gilbertlee-amd

I remember adding "-O0" would break RCCL . So now this is working? Is there any ROCm version limitation?

wenkaidu avatar May 13 '24 17:05 wenkaidu

We need to add the build option to CI extended test to make sure it works and will not be silently broken in future.

wenkaidu avatar May 13 '24 18:05 wenkaidu

Actually, I might have been too fast with the approval: I get an error when linking the RCCL library when this flag is set.

[100%] Linking CXX shared library librccl.so
lld: /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/llvm-project/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:1788: void llvm::SIRegisterInfo::buildVGPRSpillLoadStore(llvm::SGPRSpillBuilder&, int, int, bool, bool) const: Assertion `FrameInfo.getStackID(Index) != TargetStackID::SGPRSpill' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:

edgargabriel avatar May 17 '24 13:05 edgargabriel

Debug agent works with the debug build. compiling with -ggdb produces linker erros, closing this pr for now

akolliasAMD avatar Aug 12 '24 21:08 akolliasAMD