add pure vae25+fairmq n-m example
Adds example for running the n-m topology on Slurm cluster with GSI vae25 using sbatch only.
๐ Walkthrough
Walkthrough
Adds a comprehensive Slurm documentation guide and shell script for running the FairMQ n-m example on distributed clusters. The guide covers setup, prerequisites, topology configuration, troubleshooting, and monitoring. The script orchestrates a 1 synchronizer, 3 senders, and 4 receivers topology across allocated Slurm nodes.
Changes
| Cohort / File(s) | Summary |
|---|---|
Slurm n-m Example Setup examples/n-m/SLURM_README.md |
New comprehensive guide documenting how to run the FairMQ n-m example on Slurm clusters, including access instructions for GSI vae25, prerequisites, quick start workflow, script customization, topology explanation, troubleshooting, and monitoring commands. |
Slurm Orchestration Script examples/n-m/fairmq-start-ex-n-m-slurm.sh |
New Slurm-compatible bash script that configures job resources, sets up FairSoft binaries, and orchestrates a distributed n-m topology (1 synchronizer, 3 senders, 4 receivers) across allocated nodes using srun with PUB/SUB and PUSH/PULL channel configurations. |
Sequence Diagram(s)
sequenceDiagram
participant Slurm as Slurm Scheduler
participant Sync as Synchronizer<br/>(Node 0)
participant Recv as Receivers<br/>(Nodes 1-2)
participant Send as Senders<br/>(Nodes 1-2)
Slurm->>Sync: Launch (port 5555)
Slurm->>Recv: Launch 4 receivers
Recv->>Sync: Connect to PUB channel<br/>(receive state info)
Sync->>Recv: Publish state<br/>(ready signal)
Slurm->>Send: Launch 3 senders
Send->>Recv: Connect to PUSH/PULL<br/>(send data)
Recv->>Recv: Receive & process<br/>(in parallel)
Send->>Send: Send data stream<br/>(in parallel)
Note over Sync,Send: All components run<br/>until completion
Sync->>Sync: Teardown
Recv->>Recv: Teardown
Send->>Send: Teardown
Estimated code review effort
๐ฏ 2 (Simple) | โฑ๏ธ ~10 minutes
The additions consist of straightforward documentation and a standard Slurm orchestration script. The script follows conventional patterns for launching distributed processes without complex branching logic or intricate state management. Homogeneous review focus: documentation accuracy and script correctness for launching components in sequence.
Pre-merge checks and finishing touches
โ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title Check | โ Passed | The pull request title "add pure vae25+fairmq n-m example" directly relates to the changeset, which adds two new files to support running the n-m FairMQ topology on a Slurm cluster specifically for the GSI vae25 system. The title clearly identifies the key components being added: vae25 (the specific cluster), FairMQ (the framework), and n-m example (the topology). While the word "pure" is somewhat unclear in intent, the overall title effectively communicates the primary change without being misleading or off-topic. |
| Description Check | โ Passed | The pull request description "Adds example for running the n-m topology on Slurm cluster with GSI vae25 using sbatch only" is clearly related to the changeset. It appropriately describes what is being added (an example for the n-m topology), where it runs (Slurm cluster with GSI vae25), and the specific approach (using sbatch only). The description provides meaningful context that aligns with both the SLURM_README.md guide and the accompanying shell script, and is neither vague nor off-topic. |
| Docstring Coverage | โ Passed | No functions found in the changes. Docstring coverage check skipped. |
โจ Finishing touches
- [ ] ๐ Generate docstrings
๐งช Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.