vision icon indicating copy to clipboard operation
vision copied to clipboard

Failed to load image extension - Windows CUDA 11.7

Open atalman opened this issue 2 years ago • 9 comments

🐛 Describe the bug

I observe following failures https://github.com/pytorch/builder/actions/runs/4104686412/attempts/3 Windows CUDA 11.7, python 3.8-3.10

RuntimeError: Module torchvision FAIL: 1 Output: C:\Jenkins\Miniconda3\envs\conda-env-4104686412\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Jenkins\Miniconda3\envs\conda-env-4104686412\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
torchvision: 0.15.0.dev20230206
Traceback (most recent call last):
  File "C:\Jenkins\Miniconda3\envs\conda-env-4104686412\lib\site-packages\torch\_ops.py", line 562, in __getattr__
    op, overload_names = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator image::decode_png

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 65, in <module>
    main()
  File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 57, in main
    smoke_test_torchvision()
  File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 17, in smoke_test_torchvision
    all(x is not None for x in [torch.ops.image.decode_png, torch.ops.torchvision.roi_align]),
  File "C:\Jenkins\Miniconda3\envs\conda-env-4104686412\lib\site-packages\torch\_ops.py", line 566, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' 'image' object has no attribute 'decode_png'

Its same failure as: https://github.com/pytorch/vision/issues/7036 But now on windows.

cc @pmeier @NicolasHug @malfet

Versions

nightly

atalman avatar Feb 06 '23 23:02 atalman

This issue seems to be mitigated by:

conda install libnvjpeg-dev -c nvidia     

atalman avatar Feb 06 '23 23:02 atalman

It's a good old https://github.com/pytorch/vision/issues/4894

malfet avatar Feb 07 '23 00:02 malfet

Another fun fact, libnvjpeg conda package on Windows is 4Kb, compared to 1.2Mb for Linux https://anaconda.org/nvidia/libnvjpeg/files?version=11.8.0.2

@ptrblck , is this expected?

malfet avatar Feb 07 '23 00:02 malfet

It seems it was available in CUDA 11.7: https://anaconda.org/nvidia/libnvjpeg/files?version=11.7.2.34, so I'll check if this change is expected for Windows.

ptrblck avatar Feb 07 '23 00:02 ptrblck

Yes, it seems to be expected based on a response from the nvJPEG team: The libnvjpeg-dev package should contain the .dll and headers while libnvjpeg contains the .lib. The same convention is used for other libraries in the CUDA toolkit (on Windows).

ptrblck avatar Feb 07 '23 00:02 ptrblck

Thank you @ptrblck , this means we need to include dev packages for all libraries that we link dynamically with

atalman avatar Feb 07 '23 13:02 atalman

I'm trying to build torchvision 0.15.2 with cuda 11.7 and pytorch 1.13.1 on windows and hitting this issue.

My environment has the following versions and is hitting this warning on package import, I tried adding libnvjpeg-dev and libnvjpeg from the nvidida channel but the warnings still are thrown on import with windows popup alerts, if they were just warnings it would be fine but the windows pop up makes this unusable in a CI system.

    cudatoolkit:        11.7.0
    cudnn:              8.5.0.96-0             
    jpeg:               9e-0                       
    libnvjpeg:          11.7.2.34-0                    
    libnvjpeg-dev:      11.7.2.34-0                    
    libpng:             1.6.39-h8cc25b3_0
    libtiff:            4.5.1-0                          
    pillow:             9.5.0-py39_1                     
    python:             3.9.18-0                         
    pytorch:            1.13.1-py3.9_cuda11.7_cudnn8.5_4 
    torchvision:        0.15.2-py39_torch1131_cuda117_2  

cleebp avatar Oct 26 '23 17:10 cleebp

@cleebp you'll need pytorch 2.0 if you're using torchvision 0.15 - you can refer to our compatibility table here

NicolasHug avatar Oct 27 '23 09:10 NicolasHug

Thanks for the quick attention @NicolasHug!

Unfortunately we are stuck on pytorch 1.13.1 but are also moving to py311 this release so I don't think we have a supported torchvision version we can use from pypi's wheels. Similar to this issue with using pytorch lts and torchvision: https://github.com/pytorch/pytorch.github.io/issues/828

I think we'd have to revert to torchvision 0.14.x for our version of pytorch and try to build from source for py311 but that probably isn't supported.

cleebp avatar Oct 27 '23 15:10 cleebp