nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

FileNotFoundError when verifying dataset integrity

Open yw7 opened this issue 9 months ago • 3 comments

I encountered a FileNotFoundError when running nnUNetv2_plan_and_preprocess with --verify_dataset_integrity. in: https://github.com/MIC-DKFZ/nnUNet/blob/5db96042779fe720dc6cef7ba4b32d2f9d127d31/nnunetv2/experiment_planning/verify_dataset_integrity.py#L206C17-L206C66 It seems that the code is trying to join the folder path with 'labelsTr' and each filename from labelfiles. However, this results in incorrect file paths and raises the FileNotFoundError. The issue is resolved by directly passing labelfiles instead of joining the paths:

zip(labelfiles, [reader_writer_class] * len(labelfiles), [expected_labels] * len(labelfiles))

yw7 avatar Apr 27 '24 13:04 yw7

I have a similar issue

RuntimeError: Exception thrown in SimpleITK ImageFileReader_Execute: [D:\a\1\sitk\Code\IO\src\sitkImageReaderBase.cxx:97](file:///D:/a/1/sitk/Code/IO/src/sitkImageReaderBase.cxx#line=96):
sitk::ERROR: The file "nnUNet_raw\Dataset005_BH163test\labelsTr\nnUNet_raw\Dataset005_BH163test\labelsTr\XXXX_012.nii.gz" does not exist.

The part nnUNet_raw\Dataset005_BH163test\labelsTr is repeated in the filename.

anw1998 avatar Apr 27 '24 16:04 anw1998

Could you give me the commands you used? Did you setup your environment variables correctly? See https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/documentation/setting_up_paths.md

dojoh avatar May 06 '24 13:05 dojoh

I used this code with the path set as required:

nnUNetv2_plan_and_preprocess -d 101 -c 3d_fullres --verify_dataset_integrity

The problem arises from the way the file paths are constructed in the code. The dataset[k]['label'] is set to include the raw_dataset_folder.

https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/nnunetv2/utilities/utils.py#L58

These values are then included in the labelfiles list.

https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/nnunetv2/experiment_planning/verify_dataset_integrity.py#L186

Subsequently, the code attempts to join the folder path with the 'labelsTr' string and each filename from labelfiles.

https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/nnunetv2/experiment_planning/verify_dataset_integrity.py#L206

This results in incorrect and duplicated file paths, causing the FileNotFoundError.

The issue can be resolved by directly passing the labelfiles list instead of joining the paths, as shown in the following code snippet:

zip(labelfiles, [reader_writer_class] * len(labelfiles), [expected_labels] * len(labelfiles))

After I've applied this change, the code ran without errors, and the dataset integrity verification proceeded successfully.

yw7 avatar May 06 '24 16:05 yw7

The reason this slipped through our attention is that on Linux at least the duplication of the file path is ignored, so we never noticed... See this example:

In [9]: folder Out[9]: '/media/isensee/raw_data/nnUNet_raw/Dataset004_Hippocampus'

In [10]: i Out[10]: '/media/isensee/raw_data/nnUNet_raw/Dataset004_Hippocampus/labelsTr/hippocampus_001.nii.gz'

In [11]: join(folder, 'labelsTr', i) Out[11]: '/media/isensee/raw_data/nnUNet_raw/Dataset004_Hippocampus/labelsTr/hippocampus_001.nii.gz'

Thanks for bringing this to our attention! I fixed the problem :-)

FabianIsensee avatar May 08 '24 20:05 FabianIsensee