reframe icon indicating copy to clipboard operation
reframe copied to clipboard

Incomplete node list

Open rsarm opened this issue 3 years ago • 2 comments

On lumi the command sacct -S <start_tim> -P -j <jobid> -o jobid,state,exitcode,end,nodelist doesn't always give the complete node list since the beginning of the job. It would print for instance

JobID|State|ExitCode|End|NodeList
1305969|PENDING|0:0|Unknown|nid005828
1305969.batch|RUNNING|0:0|Unknown|nid005828

and a bit later it starts giving the whole information

JobID|State|ExitCode|End|NodeList
1305969|RUNNING|0:0|Unknown|nid[005828-005831]
1305969.batch|RUNNING|0:0|Unknown|nid005828
1305969.0|RUNNING|0:0|Unknown|nid[005829-005831]

This is a problem for reframe because it updates the node list only once https://github.com/reframe-hpc/reframe/blob/a5b66c7c41d7cc884893642fd4d9331b146a3c16/reframe/core/schedulers/slurm.py#L383-L392

rsarm avatar Jul 29 '22 08:07 rsarm

The issue with updating always would be that in _get_nodes_by_name we run the scontrol command, although I am not sure we need it. Then we call _create_nodes where we check for JobSchedulerError, but again I don't understand what is the origin of this check. https://github.com/reframe-hpc/reframe/blob/a5b66c7c41d7cc884893642fd4d9331b146a3c16/reframe/core/schedulers/slurm.py#L623-L629 I guess to avoid that we should parse the list from the nodespec and only if there are more nodes (or different? not sure if that would be possible) add them to the previous nodelist. @vkarak do you remember why we needed the scontrol command?

ekouts avatar Jul 29 '22 10:07 ekouts

The reason is that this code predates the nodelist_abbrev, so we passed the condensed nodespec to scontrol to give us back the list of nodes. Technically, we need the reverse of nodelist_abbrev here to expand the nodespec to a list of nodes. That would make it more efficient, but we should still avoid repeating the operation redundantly.

vkarak avatar Aug 03 '22 07:08 vkarak

(tagging along here, to avoid a new bug)

Using job.nodelist gives inconsistent and incomplete list of nodes allocated on slurm. Usually, it outputs one node - the first node - but often works only when i do a completely clean reframe run.

I wrote a sample program to show the error (print_nodelist.py, in github.com/ireed/HPC-reframe/blob/main/azure). Run this multiple times with multiple nodes to see the error, sometimes in different pipeline stages. I do a work around by using "$> hostname | sort" to get the nodes myself.

The backend code (reframe/core/schedulers/slurm.py) shows you use "$> scontrol -a show -o nodes" and "$> scontrol -a show -o partitions". The lower maintenance way is to just read $SLURM_NODELIST and $SLURM_JOB_PARTITION. This would deprecate nodelist_abbrev for slurm, but require an inverse nodelist_abbrev to expand the nodename(s).

ireed avatar Oct 18 '22 09:10 ireed

Hi @ireed , just for me to understand, is the problem that the job.nodelist doesn't work after compile or after run? It is a run-only test so I am not sure if you should be using the @run_before('compile') and @run_after('compile') decorators.

ekouts avatar Oct 18 '22 09:10 ekouts

Getting the nodelist through job.nodelist does not work consistently, regardless of which stage i try to do it. I do not expect it to work before or during the compile stage. But, for performance or sanity stage (or, any time after the run stage), it is incredibly useful to know which nodes i have and how many. At the moment, i do not know how to reliably get this information through the framework without hacks.

ireed avatar Oct 18 '22 13:10 ireed

The problem is that we peek into the nodelist only once, the first time we get back a non-empty node list by sacct or squeue as @rsarm has pointed out. Apparently, Slurm does not fill it up at once, so that's why we miss it sometimes and sometimes we don't. I think the best solution is to retrieve the value everytime, but do not issue scontrol to unwrap the node list until the job has finished, so that we issue scontrol once.

vkarak avatar Oct 19 '22 19:10 vkarak