Make nnUNet FIPS friendly
Description
I am running nnUNetv2_train on a FIPS-enabled HPC cluster. The same jobs run successfully on another cluster without FIPS enforcement, but fail on server whenever multiprocessing workers for data augmentation or validation are spawned.
The error we consistently see is:
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
This occurs very early, during dataloader initialization. But forcing
export nnUNet_n_proc_DA=0
export nnUNet_n_proc_val=0
allows the training to run, but disables parallel data augmentation and significantly slows down the pipeline.
Context / Debugging so far
- Python: 3.12.4
- nnU-Net: v2 (installed in a venv)
- Libraries: versions match between the FIPS-enabled and non-FIPS environments.
- Same dataset, same parameters run fine on non-FIPS, fail on FIPS-enabled environment.
There are others facing the same issue:
They attempted the fixes suggested there but they did not resolve the issue.
Request Could nnU-Net and/or its dependencies be reviewed for FIPS compliance issues? Specifically, is there a way to make the multiprocessing dataloader/augmentation workers compatible with FIPS environments?
Workaround Currently the only working workaround is:
export nnUNet_n_proc_DA=0
export nnUNet_n_proc_val=0
but this removes multiprocessing and significantly reduces training speed.
Impact Any FIPS-enabled HPC environment (common in federally regulated contexts) cannot use nnU-Net efficiently at present.
Recent findings: nnU-Net (or dependencies) must not rely on MD5 in multiprocessing, because it is not FIPS-compliant. Switching to SHA256 or another FIPS-approved hash would fix the issue.
Could you be more specific on what the issue is with nnUNet's code for FIPS compliance?
It runs on FIPS enabled machines. #2749 Was closed as they removed the dependency that was causing the error.