CCNet icon indicating copy to clipboard operation
CCNet copied to clipboard

RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0>

Open hustcc19860606 opened this issue 4 years ago • 2 comments

Hello, i use the command 'python3 train.py --data-dir ./dataset/cityscapes/ --random-mirror --random-scale --restore-from ./dataset/resnet101-imagenet.pth --gpu 4,5,6,7 --learning-rate 1e-2 --input-size 769,769 --weight-decay 1e-4 --batch-size 8 --num-steps 60000 --recurrence 2', and has some wrong as follows: /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead. warnings.warn(warning.format(ret)) 481950 images are loaded! Traceback (most recent call last): File "train.py", line 245, in main() File "train.py", line 209, in main preds = model(images, args.recurrence) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 123, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply raise output File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker output = module(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/home/cc/networks/ccnet.py", line 196, in forward x = self.relu1(self.bn1(self.conv1(x))) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/home/cc/libs/bn.py", line 184, in forward self.activation, self.slope) File "/home/cc/libs/functions.py", line 183, in forward _check(_ext.bn_mean_var_cuda, x, mean, var) File "/home/cc/libs/functions.py", line 16, in _check raise RuntimeError("CUDA Error encountered in {}".format(fn)) RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0> Can you give some suggestions? @speedinghzl @honghuis

hustcc19860606 avatar Feb 01 '21 12:02 hustcc19860606

And I follow your readme to compile Inplace-abn and criss-cross attention: root@a55bfbbee40c:/home/cc# cd libs root@a55bfbbee40c:/home/cc/libs# sh build.sh root@a55bfbbee40c:/home/cc/libs# python3 build.py generating /tmp/tmpcqj20vvw/__ext.c setting the current directory to '/tmp/tmpcqj20vvw' running build_ext building '__ext' extension creating home creating home/cc creating home/cc/libs creating home/cc/libs/src x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c __ext.c -o ./__ext.o -std=c99 -std=c++11 cc1: warning: command line option '-std=c++11' is valid for C++/ObjC++ but not for C x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c /home/cc/libs/src/lib_cffi.cpp -o ./home/cc/libs/src/lib_cffi.o -std=c99 -std=c++11 cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ cc1plus: warning: command line option '-std=c99' is valid for C/ObjC but not for C++ x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 ./__ext.o ./home/cc/libs/src/lib_cffi.o /home/cc/libs/src/bn.o -o ./__ext.so root@a55bfbbee40c:/home/cc/libs# cd ../cc_attention root@a55bfbbee40c:/home/cc/cc_attention# sh build.sh root@a55bfbbee40c:/home/cc/cc_attention# python3 build.py generating /tmp/tmpbc6m099s/__ext.c setting the current directory to '/tmp/tmpbc6m099s' running build_ext building '__ext' extension creating home creating home/cc creating home/cc/cc_attention creating home/cc/cc_attention/src x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c __ext.c -o ./__ext.o -std=c99 -std=c++11 cc1: warning: command line option '-std=c++11' is valid for C++/ObjC++ but not for C x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c /home/cc/cc_attention/src/lib_cffi.cpp -o ./home/cc/cc_attention/src/lib_cffi.o -std=c99 -std=c++11 cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ cc1plus: warning: command line option '-std=c99' is valid for C/ObjC but not for C++ x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 ./__ext.o ./home/cc/cc_attention/src/lib_cffi.o /home/cc/cc_attention/src/ca.o -o ./__ext.so Is it right?

hustcc19860606 avatar Feb 01 '21 12:02 hustcc19860606

Hi @hustcc19860606 Maybe the pure-python or >Pytorch 1.5 could solve your problem.

speedinghzl avatar Feb 07 '21 02:02 speedinghzl