rccl icon indicating copy to clipboard operation
rccl copied to clipboard

Adds Python-based test runner for RCCL

Open atulkulk opened this issue 2 months ago • 0 comments

Details

Work item: Sub-task of LWPCOMMLIBS-713

What were the changes?

  • Adds Python-based test runner for RCCL with hierarchical JSON configuration support, replacing shell-based test execution with a maintainable and extensible framework that supports GTest, performance tests, and custom executables.

  • Includes integrated LLVM code coverage reporting, MPI multi-rank/multi-node test execution, flexible test filtering, automated CMake build integration, and environment variable management with path expansion.

  • Provides clean output, comprehensive logging, and configuration inheritance via "extends" directive for easy test suite organization and reusability.

Why were the changes made?

  • Test execution with a Python framework that provides better extensibility, hierarchical JSON configuration for easier test management, and integrated LLVM code coverage reporting.

  • Enables better test organization through configuration inheritance, environment variable management with path expansion, and supports multiple test types (GTest, performance, custom) with flexible filtering and automated build integration.

How was the outcome achieved?

  • Implemented a fairly simple modular Python runner with three core components (ArgumentParser, TestConfigProcessor, TestExecutor) that parse JSON configurations with hierarchical inheritance, orchestrate CMake builds, execute MPI-based tests, and integrate LLVM coverage tools.

Additional Documentation:
Please go over README.md for more information

Approval Checklist

Do not approve until these items are satisfied.

  • [ ] Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

atulkulk avatar Nov 05 '25 22:11 atulkulk