ce107

Results 3 comments of ce107

The solution is for the first line to contain the sample count and the sample duration (two numbers in one line). Still the monitoring script needs to be modified to...

I am having the same issue and others are seeing it here as well: https://github.com/NVIDIA/Megatron-LM/issues/147

A very similar error on RHEL8.3, Slurm 20.02 and Python 3.9.19: ``` $ slurm_gpustat Traceback (most recent call last): File "/nobackup/users/ce107/miniforge3/envs/slurm-stuff/bin/slurm_gpustat", line 8, in sys.exit(main()) File "/nobackup/users/ce107/miniforge3/envs/slurm-stuff/lib/python3.9/site-packages/slurm_gpustat/slurm_gpustat.py", line 789, in...