nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

New error on windows: RuntimeError: Encountered inf in predicted array

Open wasserth opened this issue 1 year ago • 18 comments

Since yesterday I get the following error when running nnU-net v2.4.2 on windows. On ubuntu and mac this error does not happen. On windows this error also did not happen 3 days ago. Probably there was an update to some dependency without fixed version and this caused this new behaviour. Did you also come across this problem? Could it be a solution to clip inf values to the max number supported by the used dtype?

File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\totalsegmentator\nnunet.py", line 251, in nnUNetv2_predict
    predictor.predict_from_files(dir_in, dir_out,
  File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 256, in predict_from_files
    return self.predict_from_data_iterator(data_iterator, save_probabilities, num_processes_segmentation_export)
  File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 373, in predict_from_data_iterator
    prediction = self.predict_logits_from_preprocessed_data(data).cpu()
  File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 490, in predict_logits_from_preprocessed_data
    prediction = self.predict_sliding_window_return_logits(data).to('cpu')
  File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 651, in predict_sliding_window_return_logits
    predicted_logits = self._internal_predict_sliding_window_return_logits(data, slicers,
  File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 607, in _internal_predict_sliding_window_return_logits
    raise e
  File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\nnunetv2\inference\predict_from_raw_data.py", line 600, in _internal_predict_sliding_window_return_logits
    raise RuntimeError('Encountered inf in predicted array. Aborting... If this problem persists, '
RuntimeError: Encountered inf in predicted array. Aborting... If this problem persists, reduce value_scaling_factor in compute_gaussian or increase the dtype of predicted_logits to fp32

wasserth avatar Jul 26 '24 08:07 wasserth

Same issue when running the totalsegmentator model on windows , i went down couple versions , v2.2.1 , it worked well.

hari3100 avatar Aug 06 '24 10:08 hari3100

Same issue when running the totalsegmentator model on windows , i went down couple versions , v2.2.1 , it worked well.

@hari3100 Can you share a pip list/freeze?

rw579 avatar Aug 07 '24 18:08 rw579

@rw579 here it is :

Package Version


acvl_utils 0.2 asttokens 2.4.1 attrs 23.2.0 backports.tarfile 1.2.0 batchgenerators 0.25 batchgeneratorsv2 0.2 beautifulsoup4 4.12.3 bs4 0.0.2 certifi 2024.7.4 charset-normalizer 3.3.2 colorama 0.4.6 comm 0.2.2 connected-components-3d 3.18.0 contourpy 1.2.1 cycler 0.12.1 dataclasses 0.6 debugpy 1.8.2 decorator 5.1.1 dicom2nifti 2.4.11 dill 0.3.8 docutils 0.21.2 dunamai 1.18.0 dynamic_network_architectures 0.3.1 einops 0.8.0 exceptiongroup 1.2.2 executing 2.0.1 fft-conv-pytorch 1.2.0 filelock 3.15.4 fonttools 4.53.1 fsspec 2024.6.1 future 1.0.0 graphviz 0.20.3 idna 3.7 imagecodecs 2024.6.1 imageio 2.34.2 importlib_metadata 8.2.0 ipykernel 6.29.5 ipython 8.26.0 ipywidgets 8.1.3 jaraco.classes 3.4.0 jaraco.context 5.3.0 jaraco.functools 4.0.2 jedi 0.19.1 Jinja2 3.1.4 joblib 1.4.2 jsonschema 3.2.0 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyterlab_widgets 3.0.11 keyring 25.3.0 kiwisolver 1.4.5 lazy_loader 0.4 linecache2 1.0.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.6.2 matplotlib-inline 0.1.7 mdurl 0.1.2 monai 1.3.2 more-itertools 10.3.0 mpmath 1.3.0 multiprocess 0.70.16 nest_asyncio 1.6.0 networkx 3.3 nh3 0.2.18 nibabel 5.0.0 nii2dcm 0.1.5 nnunetv2 2.5.1 numpy 1.23.2 opencv-python 4.10.0.84 p_tqdm 1.4.0 packaging 24.1 pandas 2.2.2 parso 0.8.4 pathos 0.3.2 pickleshare 0.7.5 pillow 10.4.0 pip 24.2 pkginfo 1.11.1 platformdirs 4.2.2 plotly 5.23.0 pox 0.3.4 ppft 1.7.6.8 prompt_toolkit 3.0.47 psutil 6.0.0 pure_eval 0.2.3 pyarrow 17.0.0 pydicom 2.3.0 pydicom-seg 0.4.1 Pygments 2.18.0 pyparsing 3.1.2 pyrsistent 0.20.0 python-dateutil 2.9.0 python-gdcm 3.0.24.1 pytz 2024.1 pywin32 306 pywin32-ctypes 0.2.2 PyYAML 6.0.1 pyzmq 26.0.3 readme_renderer 44.0 requests 2.32.3 requests-toolbelt 1.0.0 rfc3986 2.0.0 rich 13.7.1 rt-utils 1.2.7 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 setuptools 69.5.1 SimpleITK 2.3.1 six 1.16.0 soupsieve 2.5 stack-data 0.6.2 sympy 1.13.1 tcia_utils 2.1.15 tenacity 9.0.0 threadpoolctl 3.5.0 tifffile 2024.7.24 torch 2.4.0 tornado 6.4.1 TotalSegmentator 2.2.1 tqdm 4.66.4 traceback2 1.4.0 traitlets 5.14.3 twine 4.0.2 typing_extensions 4.12.2 tzdata 2024.1 Unidecode 1.3.8 unittest2 1.1.0 urllib3 2.2.2 wcwidth 0.2.13 wheel 0.43.0 widgetsnbextension 4.0.11 xvfbwrapper 0.2.9 yacs 0.1.8 zipp 3.19.2

hari3100 avatar Aug 08 '24 13:08 hari3100

@hari3100 thanks

rw579 avatar Aug 12 '24 15:08 rw579

@rw579 这里是:

运输版本

acvl_utils 0.2 asttokens 2.4.1 attrs 23.2.0 backports.tarfile 1.2.0 batchgenerators 0.25 batchgeneratorsv2 0.2 beautifulsoup4 4.12.3 bs4 0.0.2 certifi 2024.7.4 charset-normalizer 3.3.2 colorama 0.4.6 comm 0.2.2 Connected-Components-3d 3.18.0 contourpy 1.2.1 cycler 0.12.1 dataclasses 0.6 debugpy 1.8.2 decorator 5.1.1 dicom2nifti 2.4.11 dill 0.3.8 docutils 0.21.2 dunamai 1.18.0 dynamic_network_architectures 0.3.1 einops 0.8.0 exceptiongroup 1.2.2 执行 2.0.1 fft-conv-pytorch 1.2.0 filelock 3.15.4 fonttools 4.53.1 fsspec 2024.6.1 未来 1.0.0 graphviz 0.20.3 idna 3.7 imagecodecs 2024.6.1 imageio 2.34.2 importlib_metadata 8.2.0 ipykernel 6.29.5 ipython 8.26.0 ipywidgets 8.1.3 jaraco.classes 3.4.0 jaraco.context 5.3.0 jaraco.functools 4.0.2 jedi 0.19.1 Jinja2 3.1.4 joblib 1.4.2 jsonschema 3.2.0 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyterlab_widgets 3.0.11 keyring 25.3.0 kiwisolver 1.4.5 lazy_loader 0.4 linecache2 1.0.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.6.2 matplotlib-inline 0.1.7 mdurl 0.1.2 monai 1.3.2 more-itertools 10.3.0 mpmath 1.3.0 multiprocess 0.70.16 nest_asyncio 1.6.0 networkx 3.3 nh3 0.2.18 nibabel 5.0.0 nii2dcm 0.1.5 nnunetv2 2.5.1 numpy 1.23.2 opencv-python 4.10.0.84 p_tqdm 1.4.0 包装 24.1 pandas 2.2.2 parso 0.8.4 pathos 0.3.2 pickleshare 0.7.5 pillow 10.4.0 pip 24.2 pkginfo 1.11.1 platformdirs 4.2.2 plotly 5.23.0 pox 0.3.4 ppft 1.7.6.8 prompt_toolkit 3.0.47 psutil 6.0.0 pure_eval 0.2.3 pyarrow 17.0.0 pydicom 2.3.0 pydicom-seg 0.4.1 Pygments 2.18.0 pyparsing 3.1.2 pyrsistent 0.20.0 python-dateutil 2.9.0 python-gdcm 3.0.24.1 pytz 2024.1 pywin32 306 pywin32-ctypes 0.2.2 PyYAML 6.0.1 pyzmq 26.0.3 readme_renderer 44.0 请求 2.32.3 请求工具带 1.0.0 rfc3986 2.0.0 rich 13.7.1 rt-utils 1.2.7 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 setuptools 69.5.1 SimpleITK 2.3.1 six 1.16.0 soupsieve 2.5 stack-data 0.6.2 sympy 1.13.1 tcia_utils 2.1.15 tenacity 9.0.0 threadpoolctl 3.5.0 tifffile 2024.7.24 torch 2.4.0 tornado 6.4.1 TotalSegmentator 2.2.1 tqdm 4.66.4 traceback2 1.4.0 traitlets 5.14.3 twine 4.0.2 typing_extensions 4.12.2 tzdata 2024.1 Unidecode 1.3.8 unittest2 1.1.0 urllib3 2.2.2 wcwidth 0.2.13 wheel 0.43.0 widgetsnbextension 4.0.11 xvfbwrapper 0.2.9 yacs 0.1.8 zipp 3.19.2

这个版本有点冲突把

920773408 avatar Sep 04 '24 10:09 920773408

Is this issue still persisting?

TaWald avatar Nov 05 '24 18:11 TaWald

I still get the same error when trying to run on windows.

wasserth avatar Nov 12 '24 14:11 wasserth

Hey there, can you provide me with a simple example on how to reproduce this? Is this the case for all trained models, or just totalsegmentator?

FabianIsensee avatar Nov 25 '24 12:11 FabianIsensee

It happens for different nnunet models. I created a small reproducible example here:

Test code: https://github.com/wasserth/TotalSegmentator/blob/update_mr_model/tests/tests_nnunet.py

Github action to automatically run it on mac + ubuntu + windows: https://github.com/wasserth/TotalSegmentator/blob/update_mr_model/.github/workflows/run_tests_nnunet.yml

Result where it fails only on windows: https://github.com/wasserth/TotalSegmentator/actions/runs/12046446273

I would recommend to add a similar automated test for the nnunet repository. You can mostly reuse the code I posted here. Since the input image is very small the inference runs quite fast also on cpu (You can copy the small test image from the TotalSegmentator repository). This way you can easily see if any new nnunet commit breaks some fundamental things on any operating system.

wasserth avatar Nov 27 '24 08:11 wasserth

Thanks a lot for this work. I am happy to add this as a github action. To make sure people can see that you contributed the test please make a PR :-) Please also add a verification that the output, if produced, gives the correct result.

Do you have any idea what could be causing the problem on Windows? I don't have a Windows machine on hand to test and fix this. Is this also happening on Windows when using the GPU?

FabianIsensee avatar Dec 03 '24 08:12 FabianIsensee

I took a look at the commit history. The only thing I changed that day (July 26th) was switching from torch.inference_mode back to torch.no_grad

FabianIsensee avatar Dec 03 '24 08:12 FabianIsensee

I also do not have a Windows Machine available. I only see the windows results from the github action (and from user issues who report problems). I will create a PR with this github action test.

wasserth avatar Dec 03 '24 16:12 wasserth

Thanks! Christmas time is super busy and I am not getting anything done at the moment. I have a private PC that I can use for debugging but this will take some time because this requires me familiarizing myself with python on Windows :see_no_evil:

FabianIsensee avatar Dec 05 '24 06:12 FabianIsensee

I just submitted a pull request.

wasserth avatar Dec 17 '24 12:12 wasserth

Has anyone found the problem? I still do not have a Windows setup to check what's going on and why this fails

FabianIsensee avatar Jun 02 '25 09:06 FabianIsensee

I tried to run on my windows machine but cannot reproduce (yet) with following dependencies: requirements.txt

test output:

(.venv) D:\lloyd\dev\nnUNet-fork>python nnunetv2/tests/integration_tests/run_nnunet_inference.py                                                                
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.

#######################################################################
Please cite the following paper when using nnU-Net:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.
#######################################################################

perform_everything_on_device=True is only supported for cuda devices! Setting this to False
D:\lloyd\dev\nnUNet-fork\.venv\lib\site-packages\nnunetv2\utilities\plans_handling\plans_handler.py:37: UserWarning: Detected old nnU-Net plans format. Attempting to reconstruct network architecture parameters. If this fails, rerun nnUNetv2_plan_experiment for your dataset. If you use a custom architecture, please downgrade nnU-Net to the version you implemented this or update your implementation + plans.
  warnings.warn("Detected old nnU-Net plans format. Attempting to reconstruct network architecture "
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
There are 1 cases in the source folder
I am processing 0 out of 1 (max process ID is 0, we start counting with 0!)
There are 1 cases that I would like to predict
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.

Predicting example_ct_sm:
perform_everything_on_device: False
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.85s/it]
sending off prediction to background worker for resampling and export
done with example_ct_sm
Test passed: nnUNet inference works correctly (dice=0.9999999999979113)

dyollb avatar Aug 07 '25 07:08 dyollb

I tried installing the requirements posted by @hari3100 with the current nnunetv2, but there are many dependency conflicts. But removing stuff like monai, which is not required or niidcm, which has very strict dependencies, I can reproduce the following issue with these requirements.

D:\lloyd\dev\nnUNet-fork\venv_bad\lib\site-packages\nnunetv2\utilities\plans_handling\plans_handler.py:37: UserWarning: Detected old nnU-Net plans format. Attempting to reconstruct network architecture parameters. If this fails, rerun nnUNetv2_plan_experiment for your dataset. If you use a custom architecture, please downgrade nnU-Net to the version you implemented this or update your implementation + plans.
  warnings.warn("Detected old nnU-Net plans format. Attempting to reconstruct network architecture "
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
There are 1 cases in the source folder
I am processing 0 out of 1 (max process ID is 0, we start counting with 0!)
There are 1 cases that I would like to predict
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.

Predicting example_ct_sm:
perform_everything_on_device: False
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.47s/it]
sending off prediction to background worker for resampling and export
done with example_ct_sm
D:\lloyd\dev\nnUNet-fork\venv_bad\lib\site-packages\skimage\transform\_warps.py:738: RuntimeWarning: All-NaN slice encountered
  min_val = min_func(input_image)
D:\lloyd\dev\nnUNet-fork\venv_bad\lib\site-packages\skimage\transform\_warps.py:742: RuntimeWarning: All-NaN slice encountered
  max_val = max_func(input_image)
Traceback (most recent call last):
  File "D:\lloyd\dev\nnUNet-fork\nnunetv2\tests\integration_tests\run_nnunet_inference.py", line 47, in <module>
    run_tests_and_exit_on_failure()
  File "D:\lloyd\dev\nnUNet-fork\nnunetv2\tests\integration_tests\run_nnunet_inference.py", line 38, in run_tests_and_exit_on_failure
    assert images_equal, f"The nnunet segmentation is not correct (dice: {dice:.5f})."
AssertionError: The nnunet segmentation is not correct (dice: 0.00000).

Note: Replacing the PyTorch version torch==2.4.0 by torch==2.4.1 seems to fix the issue!

I have not yet identified the reason why version 2.4.0 behaves differently. Initially, I thought it may be a numpy 2.0 issue, but I pinned numpy==1.26.4, so this seems unlikely.

dyollb avatar Aug 07 '25 07:08 dyollb

Testing with torch==2.4.0 (using requirements as above):

The gaussian seems to produce a tensor with infinite values ('inf'):

Image

If you normalize the 'gaussian_importance_map' using numpy at least the gaussian does not contain 'Inf', e.g. replace

    gaussian_importance_map = gaussian_filter(tmp, sigmas, 0, mode='constant', cval=0)
    gaussian_importance_map = torch.from_numpy(gaussian_importance_map)
    gaussian_importance_map /= (torch.max(gaussian_importance_map) / value_scaling_factor)

by

    gaussian_importance_map = gaussian_filter(tmp, sigmas, 0, mode='constant', cval=0)
    gaussian_importance_map /= (np.max(gaussian_importance_map) / value_scaling_factor)
    gaussian_importance_map = torch.from_numpy(gaussian_importance_map)

Unfortunately, the predicted logits still contain 'Inf' though, i.e., you still get this exception (Encountered inf in predicted array. Aborting...)

Image

dyollb avatar Aug 07 '25 09:08 dyollb