Add batch support for SLURM job arrays
This PR adds a new batchsize option to the SlurmInterface with a default of 0. If left at 0 everything stays as it is now. When a batch size is specified we keep the current setup but run ceil(n / batchsize) job arrays sequentially before collecting the complete results and continuing the analysis. This is done by adding a batch id as the last arguments to the functions that generate and execute HPC jobs.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.12%. Comparing base (
28b0867) to head (ec650d4). Report is 3 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #184 +/- ##
==========================================
+ Coverage 94.74% 95.12% +0.38%
==========================================
Files 35 35
Lines 1465 1478 +13
==========================================
+ Hits 1388 1406 +18
+ Misses 77 72 -5
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I'm testing it on our cluster now to see if we can run 10000 samples in batches of 300 🤞.
I generalized the hpc functions a bit more. Any future hpc scheduler should override setup_hpc_jobs and run_hpc_jobs and that's it.
A very useful feature, if your scheduler limits the number of job submission.
I added a couple of warnings about having /test/test_utilities in path, which I believe could cause failures when testing locally.
I additionally added an option to test the package on HPC using actual slurm:
julia --project
using Pkg
Pkg.test(;test_args=["HPC", "YOUR_ACCOUNT", "YOUR_PARTITION"])
if you have the package cloned locally. Or if you have it installed from the registry, I believe the following works:
using Pkg
Pkg.test("UncertaintyQuantification"; test_args=["HPC", "YOUR_ACCOUNT", "YOUR_PARTITION"])
I also moved the work directory of the generated simulation to someplace local, as I was getting weird behaviour with temp directories.