ert icon indicating copy to clipboard operation
ert copied to clipboard

Implement new config key MEMORY_PR_(JOB|FORWARD_MODEL|REALIZATION)

Open xjules opened this issue 1 year ago • 5 comments

We should propagated specifying the MEMORY requirement to the main ERT config level and thus having this option as a new config key.

xjules avatar Apr 29 '24 10:04 xjules

Takes over from #8198

berland avatar Jun 20 '24 11:06 berland

Current state:

For LSF one has to specify the memory requirements in an LSF resource string QUEUE_OPTION LSF LSF_RESOURCE rusage[mem=15000] -- in megabytes

for OpenPBS one can do: QUEUE_OPTION TORQUE MEMORY_PER_JOB 16gb

and for Slurm one can do: QUEUE_OPTION SLURM MEMORY_PER_CPU 4000 -- in megabytes

and (!) QUEUE_OPTION SLURM MEMORY 16000 -- per node, in megabytes

In slurm_driver.cpp, there is no special treatment of these two different ways of setting memory requirements, they are just passed onto sbatch through its --mem and --mem-per-cpu options, but these options are mutually exclusive in Slurm.

berland avatar Jun 20 '24 11:06 berland

Proposal:

Make a new Ert keyword at the highest level, not as a queue option, that will represent the amount of memory the realization ( = job) will use, irrespective of NUM_CPU. If the job uses more memory when split over multiple CPUs (summed over all CPUs), then the user must specify the needed sum in the config.

The slurm queue option MEMORY_PER_CPU should become deprecated.

berland avatar Jun 20 '24 11:06 berland

Discussion needed:

A decision is needed on which units this keyword should take, and whether there should be a default if no unit is supplied. Each driver must be responsible for passing on the value in a unit supported by the queue system.

Currently the unit suffixes supported are:

https://github.com/equinor/ert/blob/761b07012717abe6d567cd9cb7d75130f03b122c/src/ert/config/queue_config.py#L192-L195

berland avatar Jun 20 '24 12:06 berland

Decision at standup:

The name of the new keyword should be  REALIZATION_MEMORY.

berland avatar Jun 25 '24 13:06 berland