FairMQ icon indicating copy to clipboard operation
FairMQ copied to clipboard

add pure vae25+fairmq n-m example

Open rbx opened this issue 5 months ago โ€ข 1 comments

Adds example for running the n-m topology on Slurm cluster with GSI vae25 using sbatch only.

rbx avatar Oct 21 '25 08:10 rbx

๐Ÿ“ Walkthrough

Walkthrough

Adds a comprehensive Slurm documentation guide and shell script for running the FairMQ n-m example on distributed clusters. The guide covers setup, prerequisites, topology configuration, troubleshooting, and monitoring. The script orchestrates a 1 synchronizer, 3 senders, and 4 receivers topology across allocated Slurm nodes.

Changes

Cohort / File(s) Summary
Slurm n-m Example Setup
examples/n-m/SLURM_README.md
New comprehensive guide documenting how to run the FairMQ n-m example on Slurm clusters, including access instructions for GSI vae25, prerequisites, quick start workflow, script customization, topology explanation, troubleshooting, and monitoring commands.
Slurm Orchestration Script
examples/n-m/fairmq-start-ex-n-m-slurm.sh
New Slurm-compatible bash script that configures job resources, sets up FairSoft binaries, and orchestrates a distributed n-m topology (1 synchronizer, 3 senders, 4 receivers) across allocated nodes using srun with PUB/SUB and PUSH/PULL channel configurations.

Sequence Diagram(s)

sequenceDiagram
    participant Slurm as Slurm Scheduler
    participant Sync as Synchronizer<br/>(Node 0)
    participant Recv as Receivers<br/>(Nodes 1-2)
    participant Send as Senders<br/>(Nodes 1-2)

    Slurm->>Sync: Launch (port 5555)
    Slurm->>Recv: Launch 4 receivers
    Recv->>Sync: Connect to PUB channel<br/>(receive state info)
    Sync->>Recv: Publish state<br/>(ready signal)
    Slurm->>Send: Launch 3 senders
    Send->>Recv: Connect to PUSH/PULL<br/>(send data)
    Recv->>Recv: Receive & process<br/>(in parallel)
    Send->>Send: Send data stream<br/>(in parallel)
    Note over Sync,Send: All components run<br/>until completion
    Sync->>Sync: Teardown
    Recv->>Recv: Teardown
    Send->>Send: Teardown

Estimated code review effort

๐ŸŽฏ 2 (Simple) | โฑ๏ธ ~10 minutes

The additions consist of straightforward documentation and a standard Slurm orchestration script. The script follows conventional patterns for launching distributed processes without complex branching logic or intricate state management. Homogeneous review focus: documentation accuracy and script correctness for launching components in sequence.

Pre-merge checks and finishing touches

โœ… Passed checks (3 passed)
Check name Status Explanation
Title Check โœ… Passed The pull request title "add pure vae25+fairmq n-m example" directly relates to the changeset, which adds two new files to support running the n-m FairMQ topology on a Slurm cluster specifically for the GSI vae25 system. The title clearly identifies the key components being added: vae25 (the specific cluster), FairMQ (the framework), and n-m example (the topology). While the word "pure" is somewhat unclear in intent, the overall title effectively communicates the primary change without being misleading or off-topic.
Description Check โœ… Passed The pull request description "Adds example for running the n-m topology on Slurm cluster with GSI vae25 using sbatch only" is clearly related to the changeset. It appropriately describes what is being added (an example for the n-m topology), where it runs (Slurm cluster with GSI vae25), and the specific approach (using sbatch only). The description provides meaningful context that aligns with both the SLURM_README.md guide and the accompanying shell script, and is neither vague nor off-topic.
Docstring Coverage โœ… Passed No functions found in the changes. Docstring coverage check skipped.
โœจ Finishing touches
  • [ ] ๐Ÿ“ Generate docstrings
๐Ÿงช Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

โค๏ธ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 21 '25 08:10 coderabbitai[bot]