ert icon indicating copy to clipboard operation
ert copied to clipboard

Dump output from `bhist -l` <lsfjobid>` to runpath

Open berland opened this issue 1 year ago • 7 comments

Is your feature request related to a problem? Please describe. The output from bhist -l <jobid> on finished LSF jobs is too interesting not to leave easily accessible, and should be dumped to the runpath:

Job <289263>, User <havb>, Project <default>, Command <sleep 10>
Thu Apr 18 10:00:40: Submitted from host <st-grid03>, to Queue <normal>, CWD <$
                     HOME>;
Thu Apr 18 10:01:28: Dispatched to <st-rst14-03-05>, Effective RES_REQ <select[
                     (cs)&&(type == any )&&(mem>maxmem*1/12)] order[r15s:pg:bjo
                     bs] span[hosts=1] same[model] >;
Thu Apr 18 10:01:28: Starting (Pid 658);
Thu Apr 18 10:01:28: Running with execution home </private/havb>, Execution CWD
                      </private/havb>, Execution Pid <658>;
Thu Apr 18 10:01:38: Done successfully. The CPU time used is 0.1 seconds; 
Thu Apr 18 10:02:01: Post job process done successfully;

MEMORY USAGE:
MAX MEM: 3.9 Gbytes;  AVG MEM: 3.9 Gbytes

Summary of time in seconds spent in various states by  Thu Apr 18 10:02:01
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  48	   0        10       0        0        0        58

Describe the solution you'd like Dump the output to some filename.

Describe alternatives you've considered* Do nothing.

berland avatar Apr 18 '24 08:04 berland

The LSF stdout might be sufficient though, but must be fixed in #7695. Examine if there are differences when OOM strikes f.ex.

berland avatar Apr 18 '24 08:04 berland

What should the filename be? @berland I have some ideas:

  • bhist_job_summary.txt
  • lsf_job_summary.txt
  • job_summary.txt (would not be created for other queue systems than lsf anyways)

jonathan-eq avatar Apr 29 '24 11:04 jonathan-eq

The one on the left is lsf stdout while the right one is the bhist long version. image

jonathan-eq avatar Apr 29 '24 12:04 jonathan-eq

As for filename, we already have <JOBNAME>.LSF-out for stdout, and we might get <JOBNAME>.LSF-err for stderr (that is a potential issue to write). To be in line with that system, what about <JOBNAME>.LSF-bhist-l ?

berland avatar Apr 30 '24 13:04 berland

The lsf stdout already provides all the information found in bhist -l, so echoing the output to a file wouldn't give us anything extra.

jonathan-eq avatar May 03 '24 07:05 jonathan-eq

One field that is not included in lsf stdout is Dispatched to <cluster_node>, Effective RES_REQ <select[(cs)&&(type==any)>. Maybe getting the resource requirement string would be reason enough to keep the output? @berland

jonathan-eq avatar May 03 '24 07:05 jonathan-eq

Yes, I think this is sufficient to warrant also outputting this. There might be other corner-case scenarios where this diff is changed too, and that is when it is the most interesting.

berland avatar May 03 '24 11:05 berland