memilio icon indicating copy to clipboard operation
memilio copied to clipboard

Parallelisation of runs in parameter study fails for large number of runs/tmax

Open HenrZu opened this issue 1 year ago • 0 comments

Bug description

Parallelisation of the runs in the parameter study fails because the ensemble_results of the individual ranks are all sent at once due to an Overflow of the Int value bytes_size. The maximum capacity is quickly reached if the flows are also to be saved. In my case, I use the Secir model with tmax = 250. Just the flows have a size of 72mb per run (dim 400 [# Counties] * 250[# Days] * 15[# Flows] * 6 [# Age groups] * 8[size double])

Version

Linux

To reproduce

Save the flows + results in the results processing function and do 150 runs with tmax=250.

Relevant log output

[sc-030233l:880257] * An error occurred in MPI_Send
[sc-030233l:880257] * reported by process [1905983489,9]
[sc-030233l:880257] * on communicator MPI_COMM_WORLD
[sc-030233l:880257] * MPI_ERR_COUNT: invalid count argument
[sc-030233l:880257] * MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[sc-030233l:880257] *    and potentially your MPI job)

Add any relevant information, e.g. used compiler, screenshots.

No response

Checklist

  • [X] Attached labels, especially loc:: or model:: labels.
  • [X] Linked to project

HenrZu avatar Jun 26 '24 07:06 HenrZu