awsome-distributed-training icon indicating copy to clipboard operation
awsome-distributed-training copied to clipboard

Slurm job template: how a job can probe instance topology and hostname-instanceid mappings…

Open verdimrc opened this issue 10 months ago • 1 comments

Issue #, if available: N/A

Description of changes: a sample template on writing Slurm job that probes ec2 informations, so that job logs contain as much info as possible for later analysis.

  • check instance topoloty
  • display the mapping between hostname (of allocated nodes) and their instance id.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

verdimrc avatar Apr 16 '24 11:04 verdimrc