Antares_Simulator icon indicating copy to clipboard operation
Antares_Simulator copied to clipboard

When a batch is almost completed, most CPUs are idle

Open flomnes opened this issue 2 years ago • 2 comments

Note : MCY = Monte-Carlo Year

I've tried running a process with 20 parallel jobs using option --force-parallel=20. The study is quite large.

I use 100 MCY, which are split into 5 batches of 20 parallel MCY.

I've noticed that some MCY take longer than others. For each batch, when the last years of the batch are being processed, most CPUs are idle.

In studies where the ratio (longest year / shortest year) per batch is high, this is a waste of resources. I think we could save 10-15% computation time on average if CPUs were used more efficiently.

The best case : All MCY take the same time, no time is wasted.

Worst case (probably exaggerated) : All MCY are fast, except for 1 which takes much longer. In this case, all CPUs except 1 are idle most of the time.

The problem is that there is a queue of queues.

  1. Launch batch 1. Wait for all years of batch 1 to be over <= WASTE CPU TIME HERE
  2. Launch batch 2. Wait for all years of batch 2 to be over <= WASTE CPU TIME HERE
  3. etc.

With this poor design, before a new batch is started, only 1 CPU is used :

[Fri Jul 29 22:33:25 2022][solver][infos] Exporting results : UK00I - UKNI
[Fri Jul 29 22:33:25 2022][solver][infos] Exporting results : DK00
[Fri Jul 29 22:33:25 2022][solver][infos] Exporting results : ISEM
[Fri Jul 29 22:33:25 2022][solver][infos] Exporting results : IT00
[Fri Jul 29 22:33:25 2022][solver][infos] Exporting results : NO00
[Fri Jul 29 22:33:25 2022][solver][infos] Exporting results : SE00
[Fri Jul 29 22:33:49 2022][solver][infos] parallel batch size : 20 (20 perfomed)
[Fri Jul 29 22:33:49 2022][solver][infos] Year(s) 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 

I think this could be solved with minor changes in the scheduler, though I may be wrong. To be precise, we should eliminate the concept of "batches" completely and let the queue scheduler handle scheduling.

  1. Create 100 jobs, let the job scheduler handle them in whatever order is best using the max number of jobs setting.

image

DISADVANTAGE

Hydro-levels and hot start won't survive. They should never have existed anyway, this is not how Monte-Carlo works.

ADVANTAGE

Save about 10-15% computation time. Some tests are needed to assess the possible performance improvement.

flomnes avatar Jul 29 '22 20:07 flomnes