apex icon indicating copy to clipboard operation
apex copied to clipboard

RuntimeError: apex.optimizers.FusedAdam requires cuda extensions

Open life97 opened this issue 3 years ago • 17 comments

My environment is configured as Windows server2016, torch 1.8.1, torchvision 0.9.1, cuda10.2, apex is successfully installed, but when running the project code (NVlabs/imagenaire), an error is reported:

Initialize net_G and net_D weights using type: orthogonal gain: 1 net_G parameter count: 30,258,966 net_D parameter count: 32,322,498 Traceback (most recent call last): File "H:\19xyy\project\imaginaire-master\train.py", line 100, in main() File "H:\19xyy\project\imaginaire-master\train.py", line 60, in main get_model_optimizer_and_scheduler(cfg, seed=args.seed) File "H:\19xyy\project\imaginaire-master\imaginaire\utils\trainer.py", line 115, in get_model_optimizer_and_scheduler opt_G = get_optimizer(cfg.gen_opt, net_G) File "H:\19xyy\project\imaginaire-master\imaginaire\utils\trainer.py", line 257, in get_optimizer return get_optimizer_for_params(cfg_opt, params) File "H:\19xyy\project\imaginaire-master\imaginaire\utils\trainer.py", line 274, in get_optimizer_for_params opt = FusedAdam(params, File "G:\Anaconda3\envs\xyy_imagenaire\lib\site-packages\apex\optimizers\fused_adam.py", line 80, in init raise RuntimeError('apex.optimizers.FusedAdam requires cuda extensions') RuntimeError: apex.optimizers.FusedAdam requires cuda extensions

The versions of nvcc -V and print(torch.version.cuda) are the same. I don’t know why this error is reported. Are there any good suggestions to make the code run correctly? Looking forward to your reply, thank you very much! 161 162

life97 avatar Oct 16 '21 08:10 life97

hi, i meet the same problem, has it happen before? it's my first time use this optimizer. my env this ubuntu 18.0.4 torch 1.8.0 cuda 11.1

Dawn-bin avatar Oct 24 '21 08:10 Dawn-bin

hi, i meet the same problem, has it happen before? it's my first time use this optimizer. my env this ubuntu 18.0.4 torch 1.8.0 cuda 11.1

Sorry, I haven't solved this problem yet. I use imagenaire mainly because I want to run the MUNIT model, and I can run through the official code before, so I didn't continue to solve this problem.

life97 avatar Oct 26 '21 01:10 life97

hi, i meet the same problem, has it happen before? it's my first time use this optimizer. my env this ubuntu 18.0.4 torch 1.8.0 cuda 11.1

hi, i meet the same problem. Have you solved this problem?

suxin1412 avatar Dec 28 '21 15:12 suxin1412

hi, i meet the same problem, has it happen before? it's my first time use this optimizer. my env this ubuntu 18.0.4 torch 1.8.0 cuda 11.1

hi, i meet the same problem. Have you solved this problem?

yeah, it sames like that apex is installed on only cpu, you can solve this trying to reinstall apex CUDA contained follow the readme. hope it works.

Dawn-bin avatar Dec 28 '21 15:12 Dawn-bin

hi, i have meet the same problem, have u solved the problem?

kongyuzhuo avatar Oct 26 '22 05:10 kongyuzhuo

This is because of apex cannot import amp_C,you can check the file "G:\Anaconda3\envs\xyy_imagenaire\lib\site-packages\apex\optimizers\fused_adam.py", also you can use your python shell to verify this:

import torch
import amp_C  # must import torch before import amp_C

Maybe you can get error like: libstdc++.so.6: version 'GLIBCXX_3.4.20' not found', If so, you can try the following commands:

conda install libgcc
export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH
cd /path/to/anaconda/envs/myenv/lib
ln -s libstdc++.so.6.0.30 libstdc++.so.6

And you can add export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH to ~/.bashrc file.

Chiang97912 avatar Nov 28 '22 07:11 Chiang97912

Some error. Not solved yet. Ubuntu-20.04(WSL2) python3.9 cuda116 cudnn850 torch1.12.1 following the readme installation. Btw, import torch then import amp_C also failed. Hope someone can fix it or provide a solution.

huang-zeyu avatar Feb 01 '23 09:02 huang-zeyu

I have also experienced this error: I had successfully installed Apex in a certain environment before, but when I switched to a different environment and tried to reinstall Apex, it appeared to install successfully, but when running the code, it always gave the error "RuntimeError: apex.optimizers.FusedAdam requires cuda extensions". Later, I deleted the Apex folder downloaded from GitHub, downloaded it again, and reinstalled Apex. In the end, it was successfully executed.

GuangmingChan avatar Mar 12 '23 13:03 GuangmingChan

I solved this problem by building with

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

rather than

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

My pip version is 22.3.1.

ShoufaChen avatar Jun 25 '23 13:06 ShoufaChen

I have installed the apex with the below command. but still getting the error RuntimeError: apex.optimizers.FusedAdam requires cuda extensions Linux 5.15.120.2 cuda 11.8 pip 23 torch cu118

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

nshah-sfoundation avatar Aug 25 '23 16:08 nshah-sfoundation

I get the same issue using pip 22.0.4, and the command pointed on the README:

# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

I noticed that even with the command above, --cpp_ext and --cuda_ext are not in sys.argv that reaches setup.py (and that is what seems to be checked):

['(...)/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py', 'dist_info', '--egg-base', '/tmp/pip-modern-metadata-pvaz06q9']

filipesmg avatar Sep 05 '23 12:09 filipesmg

Anyone who solved this issue?

Tolga-Karahan avatar Oct 12 '23 07:10 Tolga-Karahan

this solution works in my case https://github.com/NVIDIA/apex/issues/1204#issuecomment-1659884672

frankielp avatar Oct 16 '23 11:10 frankielp

@frankielp thanks. I tried but got another error: ninja: error: '/app/csrc/amp_C_frontend.cpp', needed by '/app/build/temp.linux-x86_64-cpython-310/csrc/amp_C_frontend.o', missing and no known rule to make it. I'll create an issue for that.

Tolga-Karahan avatar Oct 17 '23 11:10 Tolga-Karahan

This is because of apex cannot import amp_C,you can check the file "G:\Anaconda3\envs\xyy_imagenaire\lib\site-packages\apex\optimizers\fused_adam.py", also you can use your python shell to verify this:

import torch
import amp_C  # must import torch before import amp_C

Maybe you can get error like: libstdc++.so.6: version 'GLIBCXX_3.4.20' not found', If so, you can try the following commands:

conda install libgcc
export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH
cd /path/to/anaconda/envs/myenv/lib
ln -s libstdc++.so.6.0.30 libstdc++.so.6

And you can add export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH to ~/.bashrc file.

Finally solve my problem, you are so fucking bralliant Bro! 老哥真nb

WuHongyuQXWX avatar Oct 23 '23 02:10 WuHongyuQXWX

I solved this problem by building with

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

rather than

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

My pip version is 22.3.1.

THANK YOU VERY MUCH,IT IS HELPFUL

Flame-circle avatar Mar 18 '24 07:03 Flame-circle

I solved this problem by building with

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

rather than

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

My pip version is 22.3.1.

This solution didn't work for me (on pip 24.0), and instead I had to use https://github.com/NVIDIA/apex/issues/1204#issuecomment-1659884672

mkerin avatar May 20 '24 11:05 mkerin