nnUNet
nnUNet copied to clipboard
FileNotFoundError when verifying dataset integrity
I encountered a FileNotFoundError when running nnUNetv2_plan_and_preprocess
with --verify_dataset_integrity
.
in:
https://github.com/MIC-DKFZ/nnUNet/blob/5db96042779fe720dc6cef7ba4b32d2f9d127d31/nnunetv2/experiment_planning/verify_dataset_integrity.py#L206C17-L206C66
It seems that the code is trying to join the folder path with 'labelsTr'
and each filename from labelfiles
. However, this results in incorrect file paths and raises the FileNotFoundError
.
The issue is resolved by directly passing labelfiles
instead of joining the paths:
zip(labelfiles, [reader_writer_class] * len(labelfiles), [expected_labels] * len(labelfiles))
I have a similar issue
RuntimeError: Exception thrown in SimpleITK ImageFileReader_Execute: [D:\a\1\sitk\Code\IO\src\sitkImageReaderBase.cxx:97](file:///D:/a/1/sitk/Code/IO/src/sitkImageReaderBase.cxx#line=96):
sitk::ERROR: The file "nnUNet_raw\Dataset005_BH163test\labelsTr\nnUNet_raw\Dataset005_BH163test\labelsTr\XXXX_012.nii.gz" does not exist.
The part nnUNet_raw\Dataset005_BH163test\labelsTr is repeated in the filename.
Could you give me the commands you used? Did you setup your environment variables correctly? See https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/documentation/setting_up_paths.md
I used this code with the path set as required:
nnUNetv2_plan_and_preprocess -d 101 -c 3d_fullres --verify_dataset_integrity
The problem arises from the way the file paths are constructed in the code. The dataset[k]['label']
is set to include the raw_dataset_folder
.
https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/nnunetv2/utilities/utils.py#L58
These values are then included in the labelfiles
list.
https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/nnunetv2/experiment_planning/verify_dataset_integrity.py#L186
Subsequently, the code attempts to join the folder
path with the 'labelsTr'
string and each filename from labelfiles
.
https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/nnunetv2/experiment_planning/verify_dataset_integrity.py#L206
This results in incorrect and duplicated file paths, causing the FileNotFoundError
.
The issue can be resolved by directly passing the labelfiles
list instead of joining the paths, as shown in the following code snippet:
zip(labelfiles, [reader_writer_class] * len(labelfiles), [expected_labels] * len(labelfiles))
After I've applied this change, the code ran without errors, and the dataset integrity verification proceeded successfully.
The reason this slipped through our attention is that on Linux at least the duplication of the file path is ignored, so we never noticed... See this example:
In [9]: folder Out[9]: '/media/isensee/raw_data/nnUNet_raw/Dataset004_Hippocampus'
In [10]: i Out[10]: '/media/isensee/raw_data/nnUNet_raw/Dataset004_Hippocampus/labelsTr/hippocampus_001.nii.gz'
In [11]: join(folder, 'labelsTr', i) Out[11]: '/media/isensee/raw_data/nnUNet_raw/Dataset004_Hippocampus/labelsTr/hippocampus_001.nii.gz'
Thanks for bringing this to our attention! I fixed the problem :-)