New error on windows: RuntimeError: Encountered inf in predicted array
Since yesterday I get the following error when running nnU-net v2.4.2 on windows. On ubuntu and mac this error does not happen. On windows this error also did not happen 3 days ago. Probably there was an update to some dependency without fixed version and this caused this new behaviour. Did you also come across this problem? Could it be a solution to clip inf values to the max number supported by the used dtype?
File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\totalsegmentator\nnunet.py", line 251, in nnUNetv2_predict
predictor.predict_from_files(dir_in, dir_out,
File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 256, in predict_from_files
return self.predict_from_data_iterator(data_iterator, save_probabilities, num_processes_segmentation_export)
File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 373, in predict_from_data_iterator
prediction = self.predict_logits_from_preprocessed_data(data).cpu()
File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 490, in predict_logits_from_preprocessed_data
prediction = self.predict_sliding_window_return_logits(data).to('cpu')
File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 651, in predict_sliding_window_return_logits
predicted_logits = self._internal_predict_sliding_window_return_logits(data, slicers,
File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 607, in _internal_predict_sliding_window_return_logits
raise e
File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 600, in _internal_predict_sliding_window_return_logits
raise RuntimeError('Encountered inf in predicted array. Aborting... If this problem persists, '
RuntimeError: Encountered inf in predicted array. Aborting... If this problem persists, reduce value_scaling_factor in compute_gaussian or increase the dtype of predicted_logits to fp32
Same issue when running the totalsegmentator model on windows , i went down couple versions , v2.2.1 , it worked well.
Same issue when running the totalsegmentator model on windows , i went down couple versions , v2.2.1 , it worked well.
@hari3100 Can you share a pip list/freeze?
@rw579 here it is :
Package Version
acvl_utils 0.2 asttokens 2.4.1 attrs 23.2.0 backports.tarfile 1.2.0 batchgenerators 0.25 batchgeneratorsv2 0.2 beautifulsoup4 4.12.3 bs4 0.0.2 certifi 2024.7.4 charset-normalizer 3.3.2 colorama 0.4.6 comm 0.2.2 connected-components-3d 3.18.0 contourpy 1.2.1 cycler 0.12.1 dataclasses 0.6 debugpy 1.8.2 decorator 5.1.1 dicom2nifti 2.4.11 dill 0.3.8 docutils 0.21.2 dunamai 1.18.0 dynamic_network_architectures 0.3.1 einops 0.8.0 exceptiongroup 1.2.2 executing 2.0.1 fft-conv-pytorch 1.2.0 filelock 3.15.4 fonttools 4.53.1 fsspec 2024.6.1 future 1.0.0 graphviz 0.20.3 idna 3.7 imagecodecs 2024.6.1 imageio 2.34.2 importlib_metadata 8.2.0 ipykernel 6.29.5 ipython 8.26.0 ipywidgets 8.1.3 jaraco.classes 3.4.0 jaraco.context 5.3.0 jaraco.functools 4.0.2 jedi 0.19.1 Jinja2 3.1.4 joblib 1.4.2 jsonschema 3.2.0 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyterlab_widgets 3.0.11 keyring 25.3.0 kiwisolver 1.4.5 lazy_loader 0.4 linecache2 1.0.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.6.2 matplotlib-inline 0.1.7 mdurl 0.1.2 monai 1.3.2 more-itertools 10.3.0 mpmath 1.3.0 multiprocess 0.70.16 nest_asyncio 1.6.0 networkx 3.3 nh3 0.2.18 nibabel 5.0.0 nii2dcm 0.1.5 nnunetv2 2.5.1 numpy 1.23.2 opencv-python 4.10.0.84 p_tqdm 1.4.0 packaging 24.1 pandas 2.2.2 parso 0.8.4 pathos 0.3.2 pickleshare 0.7.5 pillow 10.4.0 pip 24.2 pkginfo 1.11.1 platformdirs 4.2.2 plotly 5.23.0 pox 0.3.4 ppft 1.7.6.8 prompt_toolkit 3.0.47 psutil 6.0.0 pure_eval 0.2.3 pyarrow 17.0.0 pydicom 2.3.0 pydicom-seg 0.4.1 Pygments 2.18.0 pyparsing 3.1.2 pyrsistent 0.20.0 python-dateutil 2.9.0 python-gdcm 3.0.24.1 pytz 2024.1 pywin32 306 pywin32-ctypes 0.2.2 PyYAML 6.0.1 pyzmq 26.0.3 readme_renderer 44.0 requests 2.32.3 requests-toolbelt 1.0.0 rfc3986 2.0.0 rich 13.7.1 rt-utils 1.2.7 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 setuptools 69.5.1 SimpleITK 2.3.1 six 1.16.0 soupsieve 2.5 stack-data 0.6.2 sympy 1.13.1 tcia_utils 2.1.15 tenacity 9.0.0 threadpoolctl 3.5.0 tifffile 2024.7.24 torch 2.4.0 tornado 6.4.1 TotalSegmentator 2.2.1 tqdm 4.66.4 traceback2 1.4.0 traitlets 5.14.3 twine 4.0.2 typing_extensions 4.12.2 tzdata 2024.1 Unidecode 1.3.8 unittest2 1.1.0 urllib3 2.2.2 wcwidth 0.2.13 wheel 0.43.0 widgetsnbextension 4.0.11 xvfbwrapper 0.2.9 yacs 0.1.8 zipp 3.19.2
@hari3100 thanks
@rw579 这里是:
运输版本
acvl_utils 0.2 asttokens 2.4.1 attrs 23.2.0 backports.tarfile 1.2.0 batchgenerators 0.25 batchgeneratorsv2 0.2 beautifulsoup4 4.12.3 bs4 0.0.2 certifi 2024.7.4 charset-normalizer 3.3.2 colorama 0.4.6 comm 0.2.2 Connected-Components-3d 3.18.0 contourpy 1.2.1 cycler 0.12.1 dataclasses 0.6 debugpy 1.8.2 decorator 5.1.1 dicom2nifti 2.4.11 dill 0.3.8 docutils 0.21.2 dunamai 1.18.0 dynamic_network_architectures 0.3.1 einops 0.8.0 exceptiongroup 1.2.2 执行 2.0.1 fft-conv-pytorch 1.2.0 filelock 3.15.4 fonttools 4.53.1 fsspec 2024.6.1 未来 1.0.0 graphviz 0.20.3 idna 3.7 imagecodecs 2024.6.1 imageio 2.34.2 importlib_metadata 8.2.0 ipykernel 6.29.5 ipython 8.26.0 ipywidgets 8.1.3 jaraco.classes 3.4.0 jaraco.context 5.3.0 jaraco.functools 4.0.2 jedi 0.19.1 Jinja2 3.1.4 joblib 1.4.2 jsonschema 3.2.0 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyterlab_widgets 3.0.11 keyring 25.3.0 kiwisolver 1.4.5 lazy_loader 0.4 linecache2 1.0.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.6.2 matplotlib-inline 0.1.7 mdurl 0.1.2 monai 1.3.2 more-itertools 10.3.0 mpmath 1.3.0 multiprocess 0.70.16 nest_asyncio 1.6.0 networkx 3.3 nh3 0.2.18 nibabel 5.0.0 nii2dcm 0.1.5 nnunetv2 2.5.1 numpy 1.23.2 opencv-python 4.10.0.84 p_tqdm 1.4.0 包装 24.1 pandas 2.2.2 parso 0.8.4 pathos 0.3.2 pickleshare 0.7.5 pillow 10.4.0 pip 24.2 pkginfo 1.11.1 platformdirs 4.2.2 plotly 5.23.0 pox 0.3.4 ppft 1.7.6.8 prompt_toolkit 3.0.47 psutil 6.0.0 pure_eval 0.2.3 pyarrow 17.0.0 pydicom 2.3.0 pydicom-seg 0.4.1 Pygments 2.18.0 pyparsing 3.1.2 pyrsistent 0.20.0 python-dateutil 2.9.0 python-gdcm 3.0.24.1 pytz 2024.1 pywin32 306 pywin32-ctypes 0.2.2 PyYAML 6.0.1 pyzmq 26.0.3 readme_renderer 44.0 请求 2.32.3 请求工具带 1.0.0 rfc3986 2.0.0 rich 13.7.1 rt-utils 1.2.7 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 setuptools 69.5.1 SimpleITK 2.3.1 six 1.16.0 soupsieve 2.5 stack-data 0.6.2 sympy 1.13.1 tcia_utils 2.1.15 tenacity 9.0.0 threadpoolctl 3.5.0 tifffile 2024.7.24 torch 2.4.0 tornado 6.4.1 TotalSegmentator 2.2.1 tqdm 4.66.4 traceback2 1.4.0 traitlets 5.14.3 twine 4.0.2 typing_extensions 4.12.2 tzdata 2024.1 Unidecode 1.3.8 unittest2 1.1.0 urllib3 2.2.2 wcwidth 0.2.13 wheel 0.43.0 widgetsnbextension 4.0.11 xvfbwrapper 0.2.9 yacs 0.1.8 zipp 3.19.2
这个版本有点冲突把
Is this issue still persisting?
I still get the same error when trying to run on windows.
Hey there, can you provide me with a simple example on how to reproduce this? Is this the case for all trained models, or just totalsegmentator?
It happens for different nnunet models. I created a small reproducible example here:
Test code: https://github.com/wasserth/TotalSegmentator/blob/update_mr_model/tests/tests_nnunet.py
Github action to automatically run it on mac + ubuntu + windows: https://github.com/wasserth/TotalSegmentator/blob/update_mr_model/.github/workflows/run_tests_nnunet.yml
Result where it fails only on windows: https://github.com/wasserth/TotalSegmentator/actions/runs/12046446273
I would recommend to add a similar automated test for the nnunet repository. You can mostly reuse the code I posted here. Since the input image is very small the inference runs quite fast also on cpu (You can copy the small test image from the TotalSegmentator repository). This way you can easily see if any new nnunet commit breaks some fundamental things on any operating system.
Thanks a lot for this work. I am happy to add this as a github action. To make sure people can see that you contributed the test please make a PR :-) Please also add a verification that the output, if produced, gives the correct result.
Do you have any idea what could be causing the problem on Windows? I don't have a Windows machine on hand to test and fix this. Is this also happening on Windows when using the GPU?
I took a look at the commit history. The only thing I changed that day (July 26th) was switching from torch.inference_mode back to torch.no_grad
I also do not have a Windows Machine available. I only see the windows results from the github action (and from user issues who report problems). I will create a PR with this github action test.
Thanks! Christmas time is super busy and I am not getting anything done at the moment. I have a private PC that I can use for debugging but this will take some time because this requires me familiarizing myself with python on Windows :see_no_evil:
I just submitted a pull request.
Has anyone found the problem? I still do not have a Windows setup to check what's going on and why this fails
I tried to run on my windows machine but cannot reproduce (yet) with following dependencies: requirements.txt
test output:
(.venv) D:\lloyd\dev\nnUNet-fork>python nnunetv2/tests/integration_tests/run_nnunet_inference.py
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
#######################################################################
Please cite the following paper when using nnU-Net:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.
#######################################################################
perform_everything_on_device=True is only supported for cuda devices! Setting this to False
D:\lloyd\dev\nnUNet-fork\.venv\lib\site-packages\nnunetv2\utilities\plans_handling\plans_handler.py:37: UserWarning: Detected old nnU-Net plans format. Attempting to reconstruct network architecture parameters. If this fails, rerun nnUNetv2_plan_experiment for your dataset. If you use a custom architecture, please downgrade nnU-Net to the version you implemented this or update your implementation + plans.
warnings.warn("Detected old nnU-Net plans format. Attempting to reconstruct network architecture "
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
There are 1 cases in the source folder
I am processing 0 out of 1 (max process ID is 0, we start counting with 0!)
There are 1 cases that I would like to predict
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
Predicting example_ct_sm:
perform_everything_on_device: False
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.85s/it]
sending off prediction to background worker for resampling and export
done with example_ct_sm
Test passed: nnUNet inference works correctly (dice=0.9999999999979113)
I tried installing the requirements posted by @hari3100 with the current nnunetv2, but there are many dependency conflicts. But removing stuff like monai, which is not required or niidcm, which has very strict dependencies, I can reproduce the following issue with these requirements.
D:\lloyd\dev\nnUNet-fork\venv_bad\lib\site-packages\nnunetv2\utilities\plans_handling\plans_handler.py:37: UserWarning: Detected old nnU-Net plans format. Attempting to reconstruct network architecture parameters. If this fails, rerun nnUNetv2_plan_experiment for your dataset. If you use a custom architecture, please downgrade nnU-Net to the version you implemented this or update your implementation + plans.
warnings.warn("Detected old nnU-Net plans format. Attempting to reconstruct network architecture "
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
There are 1 cases in the source folder
I am processing 0 out of 1 (max process ID is 0, we start counting with 0!)
There are 1 cases that I would like to predict
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
Predicting example_ct_sm:
perform_everything_on_device: False
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.47s/it]
sending off prediction to background worker for resampling and export
done with example_ct_sm
D:\lloyd\dev\nnUNet-fork\venv_bad\lib\site-packages\skimage\transform\_warps.py:738: RuntimeWarning: All-NaN slice encountered
min_val = min_func(input_image)
D:\lloyd\dev\nnUNet-fork\venv_bad\lib\site-packages\skimage\transform\_warps.py:742: RuntimeWarning: All-NaN slice encountered
max_val = max_func(input_image)
Traceback (most recent call last):
File "D:\lloyd\dev\nnUNet-fork\nnunetv2\tests\integration_tests\run_nnunet_inference.py", line 47, in <module>
run_tests_and_exit_on_failure()
File "D:\lloyd\dev\nnUNet-fork\nnunetv2\tests\integration_tests\run_nnunet_inference.py", line 38, in run_tests_and_exit_on_failure
assert images_equal, f"The nnunet segmentation is not correct (dice: {dice:.5f})."
AssertionError: The nnunet segmentation is not correct (dice: 0.00000).
Note: Replacing the PyTorch version torch==2.4.0 by torch==2.4.1 seems to fix the issue!
I have not yet identified the reason why version 2.4.0 behaves differently. Initially, I thought it may be a numpy 2.0 issue, but I pinned numpy==1.26.4, so this seems unlikely.
Testing with torch==2.4.0 (using requirements as above):
The gaussian seems to produce a tensor with infinite values ('inf'):
If you normalize the 'gaussian_importance_map' using numpy at least the gaussian does not contain 'Inf', e.g. replace
gaussian_importance_map = gaussian_filter(tmp, sigmas, 0, mode='constant', cval=0)
gaussian_importance_map = torch.from_numpy(gaussian_importance_map)
gaussian_importance_map /= (torch.max(gaussian_importance_map) / value_scaling_factor)
by
gaussian_importance_map = gaussian_filter(tmp, sigmas, 0, mode='constant', cval=0)
gaussian_importance_map /= (np.max(gaussian_importance_map) / value_scaling_factor)
gaussian_importance_map = torch.from_numpy(gaussian_importance_map)
Unfortunately, the predicted logits still contain 'Inf' though, i.e., you still get this exception (Encountered inf in predicted array. Aborting...)