ghost icon indicating copy to clipboard operation
ghost copied to clipboard

RTX 30xx cards

Open bmc84 opened this issue 3 years ago • 21 comments

Hi,

Does anyone have a solution how to run this with RTX 30xx cards?

"d:\anaconda\envs\sber\lib\site-packages\torch\cuda_init_.py:125: UserWarning: NVIDIA GeForce RTX 3070 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37."

My understanding is that to use CUDA + 30xx cards, we need CUDA 11.x .... but then after installing Torch with Cuda 11, mxnet fails to load, since it isn't supported by Cuda 11.

So... is it even possible to run this using RTX 30xx series?

bmc84 avatar Jan 25 '22 01:01 bmc84

Hi, @bmc84 !

You can check our colab, it can help you set up environment, since it based on CUDA 11.

google colab logo

AlexanderGroshev avatar Jan 26 '22 14:01 AlexanderGroshev

I think it works on the colab because it uses Linux and there is CUDA 11.x support for mxnet on Linux but not windows yet.

nonlin avatar Jan 27 '22 15:01 nonlin

Hi, @bmc84 !

You can check our colab, it can help you set up environment, since it based on CUDA 11.

google colab logo

Are there any instructions / tutorials available?

Orchoidizer avatar Jan 27 '22 17:01 Orchoidizer

I think it works on the colab because it uses Linux and there is CUDA 11.x support for mxnet on Linux but not windows yet.

The Colab version seems to be 11.1 ... but they "pip install mxnet-cu101mkl" ....

I don't really understand this; how it's installing the cu101 (instead of say mxnet-cu110 or 111 - whatever the real one might be)

bmc84 avatar Jan 28 '22 00:01 bmc84

Unable to install and use locally on a Windows 11 machine with RTX 3070. Tried to install on Linux on the same machine, still unsuccessful.

syddharth avatar Apr 10 '22 05:04 syddharth

Unable to install and use locally on a Windows 11 machine with RTX 3070. Tried to install on Linux on the same machine, still unsuccessful.


I was able to install and make it work for a small 10 secs video clip in an Ubuntu 20.04 Linux system with AMD Ryzen 9 processor. Graphics card is RTX 3070 Ti.

For video clips which are 30 secs long I get CUDA out-of-memory error.

For OSError: libnccl.so.2: cannot open shared object file: No such file error, I followed instructions at https://docs.nvidia.com/deeplearning/nccl/install-guide/index.html


Usage for video clips: python inference.py --batch_size 1 --source_paths examples/images/xxx.jpg or png --target_video examples/videos/xxx.mp4 --out_video_name examples/results/xxx.mp4

Usage for pictures: python inference.py --G_path weights/G_unet_3blocks.pth --num_blocks 3 --batch_size 40 --crop_size 224 --use_sr True --source_paths examples/images/xxx.jpg --target_image examples/images/xxx.png --out_image_name examples/results/xxx.png --image_to_image True


Installed cudatoolkit with conda install -c conda-forge cudatoolkit=11.2 Installed cudnn with conda install -c conda-forge cudnn=8.2 Installed torch with pip install torch==1.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html Installed torchvision with pip install torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html Installed mxnet-cu112 with pip install mxnet-cu112 (could not find mxnet-cu112 for windows) Installed onnx with pip install onnx==1.8.0 Installed onnxruntime-gpu with pip install onnxruntime-gpu==1.8.0 Installed kornia with pip install kornia==0.5.4


Conda Environment: Python 3.8

Conda List: packages in environment at /anaconda3/envs/SberSwap38:

libgcc_mutex 0.1 main
openmp_mutex 4.5 1_gnu
blas 1.0 mkl
ca-certificates 2022.3.29 h06a4308_0
certifi 2021.10.8 py38h06a4308_2
charset-normalizer 2.0.12 pypi_0 pypi click 8.1.2 pypi_0 pypi cudatoolkit 11.2.2 he111cf0_8 conda-forge cudnn 8.2.1.32 h86fa8c9_0 conda-forge cycler 0.11.0 pypi_0 pypi dill 0.3.4 pypi_0 pypi docker-pycreds 0.4.0 pypi_0 pypi easydict 1.9 pypi_0 pypi flatbuffers 2.0 pypi_0 pypi fonttools 4.32.0 pypi_0 pypi gitdb 4.0.9 pypi_0 pypi gitpython 3.1.27 pypi_0 pypi gputil 1.4.0 pypi_0 pypi idna 3.3 pypi_0 pypi imageio 2.16.1 pypi_0 pypi insightface 0.2.1 pypi_0 pypi intel-openmp 2021.4.0 h06a4308_3561
joblib 1.1.0 pypi_0 pypi kiwisolver 1.4.2 pypi_0 pypi kornia 0.5.4 pypi_0 pypi ld_impl_linux-64 2.35.1 h7274673_9
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 9.3.0 h5101ec6_17
libstdcxx-ng 9.3.0 hd4cf53a_17
llvmlite 0.38.0 pypi_0 pypi matplotlib 3.5.1 pypi_0 pypi mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py38h7f8727e_0
mkl_fft 1.3.1 py38hd3c417c_0
mkl_random 1.2.2 py38h51133e4_0
mxnet-cu112 1.9.0 pypi_0 pypi ncurses 6.3 h7f8727e_2
networkx 2.8 pypi_0 pypi numba 0.55.1 pypi_0 pypi numpy 1.21.2 py38h20f2e39_0
numpy-base 1.21.2 py38h79a1101_0
onnx 1.8.0 pypi_0 pypi onnxruntime 1.8.0 pypi_0 pypi onnxruntime-gpu 1.8.0 pypi_0 pypi opencv-python 4.5.5.64 pypi_0 pypi openssl 1.1.1n h7f8727e_0
packaging 21.3 pypi_0 pypi pathtools 0.1.2 pypi_0 pypi pillow 9.1.0 pypi_0 pypi pip 21.2.4 py38h06a4308_0
promise 2.3 pypi_0 pypi protobuf 3.20.0 pypi_0 pypi psutil 5.9.0 pypi_0 pypi pyparsing 3.0.8 pypi_0 pypi python 3.8.13 h12debd9_0
python-dateutil 2.8.2 pypi_0 pypi python-graphviz 0.8.4 pypi_0 pypi python_abi 3.8 2_cp38 conda-forge pywavelets 1.3.0 pypi_0 pypi pyyaml 6.0 pypi_0 pypi readline 8.1.2 h7f8727e_1
requests 2.27.1 pypi_0 pypi scikit-image 0.19.2 pypi_0 pypi scikit-learn 1.0.2 pypi_0 pypi scipy 1.7.3 py38hc147768_0
sentry-sdk 1.5.8 pypi_0 pypi setproctitle 1.2.2 pypi_0 pypi setuptools 58.0.4 py38h06a4308_0
shortuuid 1.0.8 pypi_0 pypi six 1.16.0 pyhd3eb1b0_1
smmap 5.0.0 pypi_0 pypi sqlite 3.38.2 hc218d9a_0
threadpoolctl 3.1.0 pypi_0 pypi tifffile 2022.4.8 pypi_0 pypi tk 8.6.11 h1ccaba5_0
torch 1.11.0+cu113 pypi_0 pypi torchvision 0.12.0+cu113 pypi_0 pypi tqdm 4.64.0 pypi_0 pypi typing-extensions 4.1.1 pypi_0 pypi urllib3 1.26.9 pypi_0 pypi wandb 0.12.14 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7f8727e_4

FBAdmirer avatar Apr 11 '22 13:04 FBAdmirer

Hi,

Does anyone have a solution how to run this with RTX 30xx cards?

"d:\anaconda\envs\sber\lib\site-packages\torch\cuda__init__.py:125: UserWarning: NVIDIA GeForce RTX 3070 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37."

My understanding is that to use CUDA + 30xx cards, we need CUDA 11.x .... but then after installing Torch with Cuda 11, mxnet fails to load, since it isn't supported by Cuda 11.

So... is it even possible to run this using RTX 30xx series?


Finally I was able to compile mxnet-1.9.0 with CUDA 11.4 (CUDNN v8.2.4.15) and able to install and make it work in Windows 11 system with AMD Ryzen 9 processor. Graphics card is RTX 3070 Ti.

Attached is the mxnet-1.9.0 windows wheel zip file which needs to be installed with command pip install mxnet-1.9.0-py3-none-any.whl after unzipping.

Conda Environment: Python 3.7

Installed cudatoolkit with conda install -c conda-forge cudatoolkit=11.4 Installed cudnn with conda install -c conda-forge cudnn=8.2 Installed torch with pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html Installed torchvision with pip install torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html Installed onnx with pip install onnx==1.9.0 Installed onnxruntime-gpu with pip install onnxruntime-gpu==1.8.0 Installed kornia with pip install kornia==0.5.4

mxnet-1.9.0-py3-none-any.whl.zip

FBAdmirer avatar Apr 18 '22 18:04 FBAdmirer

@FBAdmirer I tried the everything you mentioned. I still get this error on windows. OSError: exception: access violation writing 0x0000000000000000 Error in atexit._run_exitfuncs: Traceback (most recent call last): File "C:\Users\Admin\anaconda3\envs\sberswap\lib\site-packages\mxnet\base.py", line 592, in _notify_shutdown check_call(_LIB.MXNotifyShutdown()) OSError: exception: access violation writing 0x0000000000000000

With the help of your previous post, was able to run on Ubuntu. Doesnt work on WSL virtual environment though.

syddharth avatar Apr 19 '22 17:04 syddharth

@FBAdmirer I tried the everything you mentioned. I still get this error on windows. OSError: exception: access violation writing 0x0000000000000000 Error in atexit._run_exitfuncs: Traceback (most recent call last): File "C:\Users\Admin\anaconda3\envs\sberswap\lib\site-packages\mxnet\base.py", line 592, in _notify_shutdown check_call(_LIB.MXNotifyShutdown()) OSError: exception: access violation writing 0x0000000000000000

With the help of your previous post, was able to run on Ubuntu. Doesnt work on WSL virtual environment though.


Good to know that you were able to run it on Ubuntu.

Check if libmxnet.dll file is at C:\Users\Admin\anaconda3\envs\sberswap\mxnet as well as C:\Users\Admin\anaconda3\envs\sberswap\lib\site-packages\mxnet folders

You can also run this attached python script to see if mxnet installed properly. python sanity_test.py sanity_test.zip

You can also uninstall mxnet with pip uninstall mxnet and check conda list mxnet to see if it shows any mxnet version. If so, uninstall again with pip uninstall mxnet. Afterwards install mxnet-1.9.0 wheel.

FBAdmirer avatar Apr 19 '22 17:04 FBAdmirer

@FBAdmirer the libmxnet.dl file is there on both the folders you mentioned. Though it fails the sanity test. Tried pip uninstall mxnet and reinstall the wheel you shared, but I get the same error.

Should I try to build mxnet on my machine, how do I go about it?

syddharth avatar Apr 20 '22 06:04 syddharth

@FBAdmirer Finally I am able to run and convert images on a WSL environment on windows too. Still wont work on native windows. Thank you for all the help! 👍

syddharth avatar Apr 20 '22 10:04 syddharth

@syddharth You are welcome. I am glad that you were able to make sberswap work on a windows WSL environment. I followed instructions at https://mxnet.apache.org/versions/1.9.0/get_started/windows_setup.html#build-from-source and https://mxnet.apache.org/versions/1.9.0/get_started/build_from_source#obtaining-the-source-code to build mxnet libmxnet.dll file. Instead of Microsoft Visual Studio 2017 community edition I installed Microsoft Visual Studio 2019 community edition and encountered different issues and resolved them with google search.

FBAdmirer avatar Apr 20 '22 10:04 FBAdmirer

-Still wont work on native windows- @syddharth Looks like the mxnet 1.9.0 wheel did not include all the libraries in the common.zip file at https://www.dropbox.com/s/9vd6i5z7jwsebni/common.zip?dl=0

Unzip the file and add the dlls to the path in the windows system environment variables and hope it works for you in native windows. For me, I added E:\common folder to the path.

FBAdmirer avatar Apr 20 '22 11:04 FBAdmirer

-Still wont work on native windows- @syddharth Looks like the mxnet 1.9.0 wheel did not include all the libraries in the common.zip file at https://www.dropbox.com/s/9vd6i5z7jwsebni/common.zip?dl=0

Unzip the file and add the dlls to the path in the windows system environment variables and hope it works for you in native windows. For me, I added E:\common folder to the path.

Hmm, I'm getting the same access violation error as syddharth even after adding these DLLs to my Path variable. Are there any other files that might be missing?

ThereforeGames avatar Apr 26 '22 01:04 ThereforeGames

-Still wont work on native windows- @syddharth Looks like the mxnet 1.9.0 wheel did not include all the libraries in the common.zip file at https://www.dropbox.com/s/9vd6i5z7jwsebni/common.zip?dl=0 Unzip the file and add the dlls to the path in the windows system environment variables and hope it works for you in native windows. For me, I added E:\common folder to the path.

Hmm, I'm getting the same access violation error as syddharth even after adding these DLLs to my Path variable. Are there any other files that might be missing?


@WhiteSigility Not sure if the wheel package is missing any other files in the source build folder or not. I have built the libmxnet.dll file in my E drive. The next option is I can upload the entire build folder and you can install by following these instructions:

Uninstall mxnet with pip uninstall mxnet and check conda list mxnet to see if it shows any mxnet version. If so, uninstall again with pip uninstall mxnet.

Install MXNet Package for Python These steps are required after building from source.

Activate your conda environment and go to the folder where sberswap files are there.

  1. download incubator-mxnet.7z from https://www.dropbox.com/s/9xeptb4btmyrgoo/incubator-mxnet.7z?dl=0 and move the file to sberswap files folder
  2. unzip using 7-zip (https://www.7-zip.org/a/7z2107-x64.exe) the incubator-mxnet.7z
  3. unzip common.zip file at https://www.dropbox.com/s/9vd6i5z7jwsebni/common.zip?dl=0
  4. Add common folder to the path in the windows system environment variables
  5. cd to python folder which is in incubator-mxnet root folder
  6. then type the command python setup.py install
  7. check if mxnet is installed by typing the command conda list mxnet and you should see like this: # Name Version Build Channel mxnet 1.9.0 pypi_0 pypi

Hope with the above steps you will be able to make sberswap work in your native windows.

FBAdmirer avatar Apr 26 '22 17:04 FBAdmirer

@FBAdmirer Thank you for taking the time to write detailed instructions!

Unfortunately, I encountered a similar access violation error while running the setup.py script under incubator-mxnet:

Using t:\programs\anaconda3\envs\sber\lib\site-packages
Finished processing dependencies for mxnet==1.9.0
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "T:\programs\sber-swap\incubator-mxnet\python\mxnet\base.py", line 592, in _notify_shutdown
    check_call(_LIB.MXNotifyShutdown())
OSError: exception: access violation writing 0x0000000000000000

I'm wondering if I made a mistake in loading the common DLLs somehow. I've added it to the Path environment variable as shown here:

image

I also tried 1) placing them on another drive and 2) restarting the computer. No luck.

Could it be that I'm missing something simple like a trailing slash? 🤔

Thanks again. I'll keep trying to debug on my end.

ThereforeGames avatar Apr 29 '22 00:04 ThereforeGames

@WhiteSigility

You are welcome. Is the python version in your conda enviroment 3.7 or 3.8 or 3.9 or 3.10. I also see in your system environment path python310 & I would remove it. I have python version 3.7 in my conda enviroment.

Try creating a new conda environment with the command conda create -n SberSwap37 python=3.7 in the same drive or conda create --prefix T:\SberSwap37 python=3.7 in your T drive.

Activate your conda environment with command conda activate SberSwap37

Go to your T drive or any other drive and

  1. git clone https://github.com/sberbank-ai/sber-swap.git
  2. cd sber-swap
  3. git submodule init
  4. git submodule update
  5. Install cudatoolkit with conda install -c conda-forge cudatoolkit=11.4
  6. Install cudnn with conda install -c conda-forge cudnn=8.2
  7. Install torch with pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
  8. Install torchvision with pip install torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
  9. Install onnx with pip install onnx==1.9.0
  10. Install onnxruntime-gpu with pip install onnxruntime-gpu==1.8.0
  11. Install opencv-python with pip install opencv-python
  12. Install scikit-image with pip install scikit-image
  13. Install insightface with pip install insightface
  14. Install requests with pip install requests
  15. Install kornia with pip install kornia
  16. Install dill with pip install dill
  17. Install wandb with pip install wandb
  18. Install kornia with pip install kornia==0.5.4
  19. sh download_models.sh (hope you have sh command for windows)

After all the above steps follow the below steps:

  1. download incubator-mxnet.7z from https://www.dropbox.com/s/9xeptb4btmyrgoo/incubator-mxnet.7z?dl=0 and move the file to sber-swap files folder
  2. unzip using 7-zip (https://www.7-zip.org/a/7z2107-x64.exe) the incubator-mxnet.7z
  3. unzip common.zip file at https://www.dropbox.com/s/9vd6i5z7jwsebni/common.zip?dl=0
  4. Add common folder to the path in the windows system environment variables
  5. From incubator-mxnet root folder type python -m pip install --user -e ./python (if this step does not work then go to 5 & 6)
  6. cd to python folder which is in incubator-mxnet root folder
  7. then type the command python setup.py install

check if mxnet is installed by typing the command conda list mxnet and you should see like this: Name Version Build Channel mxnet 1.9.0 pypi_0 pypi

Hope I did not miss any steps and you will be able to install mxnet. If not, you might have to build mxnet library in Windows 10 Pro. I have built it in Windows 11 Pro.

FBAdmirer avatar Apr 30 '22 14:04 FBAdmirer

Anyone else had any luck with this? I am struggling at the same point as @syddharth and @ThereforeGames.

mustangchavez avatar Dec 23 '22 18:12 mustangchavez

@mustangchavez @syddharth @WhiteSigility @ThereforeGames

Try creating a new conda environment with the command conda create -n SberSwap38 python=3.8 in the C drive or conda create --prefix X:\SberSwap38 python=3.8 in your X drive. X could be D or E or F....

Go to your C drive or X drive where conda SberSwap environment is created. I usually create SberSwap in C drive and git clone in E or F drive.

  1. git clone https://github.com/sberbank-ai/sber-swap.git SberSwap_PY38
  2. cd SberSwap_PY38
  3. git submodule init
  4. git submodule update
  5. Install cudatoolkit with conda install -c conda-forge cudatoolkit=11.4
  6. Install cudnn with conda install -c conda-forge cudnn=8.2
  7. Install torch with pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
  8. Install torchvision with pip install torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
  9. Install onnx with pip install onnx==1.9.0
  10. Install onnxruntime-gpu with pip install onnxruntime-gpu==1.8.0
  11. Install opencv-python with pip install opencv-python
  12. Install scikit-image with pip install scikit-image
  13. Install insightface with pip install insightface
  14. Install requests with pip install requests
  15. Install kornia with pip install kornia
  16. Install dill with pip install dill
  17. Install wandb with pip install wandb
  18. Install kornia with pip install kornia==0.5.4
  19. sh download_models.sh (hope you have sh command for windows)

After all the above steps follow the below steps:

  1. Remove common folder (if you have downloaded common.zip from https://www.dropbox.com/s/9vd6i5z7jwsebni/common.zip?dl=0) from path in the windows system environment variables
  2. download mxnet-related-dll-and-wheel-files.zip from https://www.dropbox.com/s/jp1q53bs9jx75rr/mxnet-related-dll-and-wheel-files.zip?dl=0
  3. Unzip mxnet-related-dll-and-wheel-files.zip to mxnet-related-dll-and-wheel-files folder
  4. cd to mxnet-related-dll-and-wheel-files folder
  5. pip install mxnet-1.9.0-py3-none-any.whl
  6. copy all the dlls (should be 10) in mxnet-related-dll-and-wheel-files folder to C:\Users\your_user_name\MiniConda3\envs\SberSwap38\Lib\site-packages\mxnet or wherever SberSwap38 conda environment is created
  7. check if mxnet is installed by typing the command conda list mxnet and you should see like this: Name Version Build Channel mxnet 1.9.0 pypi_0 pypi
  8. Test mxnet by typing the following:

import mxnet.numpy as nd import mxnet as mx

a = nd.array([1, 2, 3], ctx=mx.gpu()) print(a)

and you should see [1. 2. 3.] @gpu(0)

Hope there won't be any access violation error now by following all the above steps.

FBAdmirer avatar Dec 24 '22 21:12 FBAdmirer

@FBAdmirer I still get an access violation on the last step when I try import mxnet.numpy as nd. I have Linux dual boot set up and got it running on Linux now so I think I will go that route. Thank you for your help regardless!!

mustangchavez avatar Jan 06 '23 04:01 mustangchavez

@FBAdmirer I still get an access violation on the last step when I try import mxnet.numpy as nd. I have Linux dual boot set up and got it running on Linux now so I think I will go that route. Thank you for your help regardless!!

I get the same error on windows11 "OSError: exception: access violation writing 0x0000000000000000"

syddharth avatar Mar 03 '23 06:03 syddharth