Adding ROCm Debug Agent support
Details
Do not mention proprietary info or link to internal work items in this PR.
Work item: "Internal", or link to GitHub issue (if applicable).
What were the changes?
Adding CMakefile / install.sh support for enabling ROCm Debug Agent support
Why were the changes made?
Make it easier to compile with the flags to support ROCm Debug Agent
How was the outcome achieved?
Modified the CMakefile and install script to take in a new "--debug-agent" argument
Additional Documentation:
Tested with ./install.sh --debug-agent and confirmed that "-O0 -ggdb" was being added to compilation flags
Approval Checklist
Do not approve until these items are satisfied.
- [ ] Verify the CHANGELOG has been updated, if
- there are any NCCL API version changes,
- any changes impact library users, and/or
- any changes impact any other ROCm library.
I remember adding "-O0" would break RCCL . So now this is working? Is there any ROCm version limitation?
We need to add the build option to CI extended test to make sure it works and will not be silently broken in future.
Actually, I might have been too fast with the approval: I get an error when linking the RCCL library when this flag is set.
[100%] Linking CXX shared library librccl.so
lld: /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/llvm-project/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:1788: void llvm::SIRegisterInfo::buildVGPRSpillLoadStore(llvm::SGPRSpillBuilder&, int, int, bool, bool) const: Assertion `FrameInfo.getStackID(Index) != TargetStackID::SGPRSpill' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
Debug agent works with the debug build. compiling with -ggdb produces linker erros, closing this pr for now