WMCore icon indicating copy to clipboard operation
WMCore copied to clipboard

R&D Investigate on pilot lifetime projected onto HL-LHC era

Open khurtado opened this issue 1 year ago • 1 comments

Impact of the new feature Place holder for R&D issue

Is your feature request related to a problem? Please describe. This is related to the list of TODO tasks in the Evaluation of the WM system for the HL-LHC scenario google document

Describe the solution you'd like The current pilot lifetime is set to 8 hours. We should investigate if changing this would be benefitial during HL-LHC era, either globally or for e.g.: HPC resources for example (e.g.: 48 hour pilots). If 8h needs to be changed, we should describe what needs to change; for instance this can be done in the reqmgr2 (but it is static now, though making it configurable would be trivial)

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

khurtado avatar Sep 25 '23 14:09 khurtado

This is actually a great idea, Kenyi. I would suggest an evaluation of a few workflows targetting job wallclocktime of: 4h, 8h, 12h, 16h.

A few metrics that come to my mind would be:

  • avg job wallclocktime
  • min job wallclocktime
  • max job wallclocktime
  • total workflow wallclocktime
  • workflow turnaround (time from acquired to completed status)
  • condor retry/failure rate
  • wmagent retry/failure rate
  • ?

@khurtado Kenyi, can you please apply the relevant labels (I guess only R&D) and fields of the project board (for the other R&D as well).

amaltaro avatar Sep 26 '23 17:09 amaltaro