CenterNet icon indicating copy to clipboard operation
CenterNet copied to clipboard

get error when run demo _dcn_v2.so: undefined symbol: __cudaRegisterFatBinaryEnd

Open guanlinting opened this issue 6 years ago • 24 comments

Hi, i try to run the demo, but after finished exactly the same step with the guide of INSTALL.md and no error encounted, i encounted the problem of "there no e_dcn_v2.so: undefined symbol: __cudaRegisterFatBinaryEnd" when run the demo, it seems that it come from the module of DCNv2. my cuda version is 9.2. I also try to update the pytorch 1.0 version of DCNv2 and pytorch1.0, but Segmentation fault(core dump). ======follow is detail of error information==== Traceback (most recent call last): File "demo.py", line 11, in from detectors.detector_factory import detector_factory File "/home/glt/oneshot/CenterNet/src/lib/detectors/detector_factory.py", line 5, in from .exdet import ExdetDetector File "/home/glt/oneshot/CenterNet/src/lib/detectors/exdet.py", line 22, in from .base_detector import BaseDetector File "/home/glt/oneshot/CenterNet/src/lib/detectors/base_detector.py", line 11, in from models.model import create_model, load_model File "/home/glt/oneshot/CenterNet/src/lib/models/model.py", line 12, in from .networks.pose_dla_dcn import get_pose_net as get_dla_dcn File "/home/glt/oneshot/CenterNet/src/lib/models/networks/pose_dla_dcn.py", line 16, in from .DCNv2.dcn_v2 import DCN File "/home/glt/oneshot/CenterNet/src/lib/models/networks/DCNv2/dcn_v2.py", line 11, in from .dcn_v2_func import DCNv2Function File "/home/glt/oneshot/CenterNet/src/lib/models/networks/DCNv2/dcn_v2_func.py", line 9, in from ._ext import dcn_v2 as _backend File "/home/glt/oneshot/CenterNet/src/lib/models/networks/DCNv2/_ext/dcn_v2/init.py", line 3, in from ._dcn_v2 import lib as _lib, ffi as _ffi ImportError: /home/glt/oneshot/CenterNet/src/lib/models/networks/DCNv2/_ext/dcn_v2/_dcn_v2.so: undefined symbol: __cudaRegisterFatBinaryEnd

======================pytorch 1.0 version ================= $ python demo.py ctdet --demo ../images --load_model ../models/ctdet_coco_dla_2x.pth Fix size testing. training chunk_sizes: [32] The output will be saved to /home/glt/CenterNet/src/lib/../../exp/ctdet/default heads {'hm': 80, 'wh': 2, 'reg': 2} Creating model... loaded ../models/ctdet_coco_dla_2x.pth, epoch 230 段错误 (核心已转储)

guanlinting avatar May 11 '19 03:05 guanlinting

I have no idea about this, but found this for you if it can help.

xingyizhou avatar May 13 '19 14:05 xingyizhou

I met the same problem. pytorch 0.4.1 CUDA 10.0 And when I run: python demo.py ctdet --demo ../images --load_model ../models/ctdet_coco_dla_2x.pth

Traceback (most recent call last): File "demo.py", line 11, in from detectors.detector_factory import detector_factory File "/home/CenterNet/src/lib/detectors/detector_factory.py", line 5, in from .exdet import ExdetDetector File "/home/CenterNet/src/lib/detectors/exdet.py", line 22, in from .base_detector import BaseDetector File "/home/CenterNet/src/lib/detectors/base_detector.py", line 11, in from models.model import create_model, load_model File "/home/CenterNet/src/lib/models/model.py", line 12, in from .networks.pose_dla_dcn import get_pose_net as get_dla_dcn File "/home/CenterNet/src/lib/models/networks/pose_dla_dcn.py", line 16, in from .DCNv2.dcn_v2 import DCN File "/home/CenterNet/src/lib/models/networks/DCNv2/dcn_v2.py", line 11, in from .dcn_v2_func import DCNv2Function File "/home/CenterNet/src/lib/models/networks/DCNv2/dcn_v2_func.py", line 9, in from ._ext import dcn_v2 as _backend File "/home/CenterNet/src/lib/models/networks/DCNv2/_ext/dcn_v2/init.py", line 3, in from ._dcn_v2 import lib as _lib, ffi as _ffi ImportError: /home/CenterNet/src/lib/models/networks/DCNv2/_ext/dcn_v2/_dcn_v2.so: undefined symbol: __cudaPopCallConfiguration

PumayHui avatar May 27 '19 03:05 PumayHui

I am the same with you @PumayHui, how to solve it?

wwlbytedance avatar Jun 03 '19 03:06 wwlbytedance

I have met the same problom with you @PumayHui .Did you solve it ?

SeeeeShiwei avatar Jun 03 '19 03:06 SeeeeShiwei

@wwlbytedance @ShiSenSen1234 Sorry, I have not solved...

PumayHui avatar Jun 03 '19 03:06 PumayHui

@xingyizhou Can you help us~~~?

SeeeeShiwei avatar Jun 03 '19 05:06 SeeeeShiwei

undefined symbol: __cudaPopCallConfiguration: Ensure that your PyTorch CUDA version and system CUDA version match (see Issue#19):

$ python -c "import torch; print(torch.version.cuda)"
$ nvcc --version

I get 9.0 and 9.2 so I install pytorch conda install pytorch=0.4.1 cuda92 -c pytorch

uniquezhengjie avatar Jun 13 '19 10:06 uniquezhengjie

It's cuda version problem. Use cuda9 and pytorch 0.4.1 will fix it.

guanxiongsun avatar Jun 14 '19 18:06 guanxiongsun

I encountered the same problem, when using CUDA 10.1 and pytorch 0.4.1. Yes, it's CUDA version problem. I switch CUDA 10.1 to CUDA 8.0, solved this problem.

hktxt avatar Jun 27 '19 08:06 hktxt

https://pytorch.org/ to get the right version pytorch for your cuda version

like cuda 10 conda install pytorch torchvision cudatoolkit==10.0 -c pytorch that will fix the problem

clemente0731 avatar Aug 03 '19 15:08 clemente0731

still can't fixed the problem with cuda9 and torch0.4.1, so strange.... There is no cudnn on my serve machine, is that a problem ?

Stephenfang51 avatar Oct 01 '19 14:10 Stephenfang51

I met the same problem. pytorch 0.4.1 CUDA 10.0 And when I run: python demo.py ctdet --demo ../images --load_model ../models/ctdet_coco_dla_2x.pth

Traceback (most recent call last): File "demo.py", line 11, in from detectors.detector_factory import detector_factory File "/home/CenterNet/src/lib/detectors/detector_factory.py", line 5, in from .exdet import ExdetDetector File "/home/CenterNet/src/lib/detectors/exdet.py", line 22, in from .base_detector import BaseDetector File "/home/CenterNet/src/lib/detectors/base_detector.py", line 11, in from models.model import create_model, load_model File "/home/CenterNet/src/lib/models/model.py", line 12, in from .networks.pose_dla_dcn import get_pose_net as get_dla_dcn File "/home/CenterNet/src/lib/models/networks/pose_dla_dcn.py", line 16, in from .DCNv2.dcn_v2 import DCN File "/home/CenterNet/src/lib/models/networks/DCNv2/dcn_v2.py", line 11, in from .dcn_v2_func import DCNv2Function File "/home/CenterNet/src/lib/models/networks/DCNv2/dcn_v2_func.py", line 9, in from ._ext import dcn_v2 as _backend File "/home/CenterNet/src/lib/models/networks/DCNv2/_ext/dcn_v2/init.py", line 3, in from ._dcn_v2 import lib as _lib, ffi as _ffi ImportError: /home/CenterNet/src/lib/models/networks/DCNv2/_ext/dcn_v2/_dcn_v2.so: undefined symbol: __cudaPopCallConfiguration

https://github.com/CharlesShang/DCNv2 Using the pytorch version 1.0 of the deformable convnets worked for me. @xingyizhou could you verify this once

earlfernando avatar Oct 09 '19 08:10 earlfernando

cuda 10.0 pytorch 0.4.1 有人解决吗

zjp99 avatar Oct 24 '19 14:10 zjp99

https://pytorch.org/ to get the right version pytorch for your cuda version

like cuda 10 conda install pytorch torchvision cudatoolkit==10.0 -c pytorch that will fix the problem

Did you solve it?

zjp99 avatar Oct 25 '19 00:10 zjp99

我的环境cuda 10.0 py36 pytorch0.4 出现问题

于是我克隆了环境,尝试方法解决,首先我更换了pytorch 1.1 和 torchvision0.3 并且更换DCNv2,最终问题解决

具体参考https://blog.csdn.net/weixin_38705903/article/details/102598339的4.2和6

zjp99 avatar Oct 25 '19 01:10 zjp99

I'm using CUDA 10.1, I solved the problem by installing pytorch 1.2.0 and replace the DCNv2 in this repo with the original repo and compile it again.

Now it works perfectly.

kwea123 avatar Nov 20 '19 12:11 kwea123

For CUDA 10.1 + pytorch1 use tag pytorch_1.0

For CUDA 9.0+pytorch0.4 use tag pytorch_0.4

bazinga012 avatar Dec 11 '19 20:12 bazinga012

As @uniquezhengjie said, this error raised when PyTorch CUDA and system CUDA version do not match. The Deformable_Convolution (DCN) in this repository requires CUDA version <=10.0, thus this repository use PyTorch0.4(which only supports CUDA version<=10.0), the error raised when your system CUDA version >=10.0.

If you do not want to downgrade your system CUDA version, it seems that you need to adopt another DCN which supports CUDA>=10.0.

  • conda install pytorch=1.0 torchvision -c pytorch

  • Change your DCN, according to @zjp99

    cd ~/Code/CenterNet/src/lib/models/networks rm -r DCNv2 git clone https://github.com/CharlesShang/DCNv2.git cd DCNv2 sh make.sh python test.py

  • python demo.py ctdet --demo /path/to/image/or/folder/or/video --load_model ../models/ctdet_coco_dla_2x.pth

In my experiments, this solved the problem.

deepalchemist avatar Feb 17 '20 03:02 deepalchemist

I'm using CUDA 10.1, I solved the problem by installing pytorch 1.2.0 and replace the DCNv2 in this repo with the original repo and compile it again.

Now it works perfectly.

@kwea123 I'm doing exactly as you are suggesting -- cuda 10.1, pytorch 1.2.0, with the replaced DCNv2. However, when I execute: python demo.py ctdet --demo ../images/ --load_model ../models/ctdet_coco_dla_2x.pth I get the following error: AssertionError: Torch not compiled with CUDA enabled Am I missing something?

Here is the full stack trace: Fix size testing. training chunk_sizes: [32] The output will be saved to /home/shihkuan/workFiles/centernet/PythonAPI/CenterNet/src/lib/../../exp/ctdet/default heads {'hm': 80, 'wh': 2, 'reg': 2} Creating model... loaded ../models/ctdet_coco_dla_2x.pth, epoch 230 Traceback (most recent call last): File "demo.py", line 56, in demo(opt) File "demo.py", line 21, in demo detector = Detector(opt) File "/home/shihkuan/workFiles/centernet/PythonAPI/CenterNet/src/lib/detectors/ctdet.py", line 26, in init super(CtdetDetector, self).init(opt) File "/home/shihkuan/workFiles/centernet/PythonAPI/CenterNet/src/lib/detectors/base_detector.py", line 26, in init self.model = self.model.to(opt.device) File "/home/shihkuan/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/shihkuan/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/shihkuan/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/shihkuan/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/shihkuan/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 230, in _apply param_applied = fn(param) File "/home/shihkuan/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 430, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) File "/home/shihkuan/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/cuda/init.py", line 178, in _lazy_init _check_driver() File "/home/shihkuan/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/cuda/init.py", line 92, in _check_driver raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

alexrider1105 avatar Jun 19 '20 07:06 alexrider1105

As @uniquezhengjie said, this error raised when PyTorch CUDA and system CUDA version do not match. The Deformable_Convolution (DCN) in this repository requires CUDA version <=10.0, thus this repository use PyTorch0.4(which only supports CUDA version<=10.0), the error raised when your system CUDA version >=10.0.

If you do not want to downgrade your system CUDA version, it seems that you need to adopt another DCN which supports CUDA>=10.0.

  • conda install pytorch=1.0 torchvision -c pytorch
  • Change your DCN, according to @zjp99 cd ~/Code/CenterNet/src/lib/models/networks rm -r DCNv2 git clone https://github.com/CharlesShang/DCNv2.git cd DCNv2 sh make.sh python test.py
  • python demo.py ctdet --demo /path/to/image/or/folder/or/video --load_model ../models/ctdet_coco_dla_2x.pth

In my experiments, this solved the problem.

When I use pytorch version 1.0 with cuda version 10.1, as you suggested, and run: python demo.py ctdet --demo ../images/ --load_model ../models/ctdet_coco_dla_2x.pth I get the following error:

ImportError: /home/shihkuan/workFiles/centernet/PythonAPI/CenterNet/src/lib/models/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

do you happen to know what is causing this?

alexrider1105 avatar Jun 19 '20 07:06 alexrider1105

As @uniquezhengjie said, this error raised when PyTorch CUDA and system CUDA version do not match. The Deformable_Convolution (DCN) in this repository requires CUDA version <=10.0, thus this repository use PyTorch0.4(which only supports CUDA version<=10.0), the error raised when your system CUDA version >=10.0.

If you do not want to downgrade your system CUDA version, it seems that you need to adopt another DCN which supports CUDA>=10.0.

  • conda install pytorch=1.0 torchvision -c pytorch
  • Change your DCN, according to @zjp99 cd ~/Code/CenterNet/src/lib/models/networks rm -r DCNv2 git clone https://github.com/CharlesShang/DCNv2.git cd DCNv2 sh make.sh python test.py
  • python demo.py ctdet --demo /path/to/image/or/folder/or/video --load_model ../models/ctdet_coco_dla_2x.pth

In my experiments, this solved the problem.

Do I have to degrade my system CUDA version? I created a new conda environment and install pytorch 0.4.1 and cuda 9.0 in this new environment,but when I run demo.py this error still occured, so I wonder the relation of system CUDA and the cuda in conda, Can you give me any suggestions?

lihuining avatar Feb 20 '23 08:02 lihuining

As @uniquezhengjie said, this error raised when PyTorch CUDA and system CUDA version do not match. The Deformable_Convolution (DCN) in this repository requires CUDA version <=10.0, thus this repository use PyTorch0.4(which only supports CUDA version<=10.0), the error raised when your system CUDA version >=10.0. If you do not want to downgrade your system CUDA version, it seems that you need to adopt another DCN which supports CUDA>=10.0.

  • conda install pytorch=1.0 torchvision -c pytorch
  • Change your DCN, according to @zjp99 cd ~/Code/CenterNet/src/lib/models/networks rm -r DCNv2 git clone https://github.com/CharlesShang/DCNv2.git cd DCNv2 sh make.sh python test.py
  • python demo.py ctdet --demo /path/to/image/or/folder/or/video --load_model ../models/ctdet_coco_dla_2x.pth

In my experiments, this solved the problem.

Do I have to degrade my system CUDA version? I created a new conda environment and install pytorch 0.4.1 and cuda 9.0 in this new environment,but when I run demo.py this error still occured, so I wonder the relation of system CUDA and the cuda in conda, Can you give me any suggestions?

@lihuining have you solved this problem? I meet the same problem

CedrusLNZ avatar Jun 07 '23 23:06 CedrusLNZ

As @uniquezhengjie said, this error raised when PyTorch CUDA and system CUDA version do not match. The Deformable_Convolution (DCN) in this repository requires CUDA version <=10.0, thus this repository use PyTorch0.4(which only supports CUDA version<=10.0), the error raised when your system CUDA version >=10.0.

If you do not want to downgrade your system CUDA version, it seems that you need to adopt another DCN which supports CUDA>=10.0.

  • conda install pytorch=1.0 torchvision -c pytorch
  • Change your DCN, according to @zjp99 cd ~/Code/CenterNet/src/lib/models/networks rm -r DCNv2 git clone https://github.com/CharlesShang/DCNv2.git cd DCNv2 sh make.sh python test.py
  • python demo.py ctdet --demo /path/to/image/or/folder/or/video --load_model ../models/ctdet_coco_dla_2x.pth

In my experiments, this solved the problem.

@DeepAlchemist

Does this project have two branch? When I use pytorch 0.4.1 and cuda90, compile error running build_ext building '_ext' extension g++ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home1/linazhan/CenterTrack/src/lib/model/networks/DCNv2/src -I/home1/linazhan/.conda/envs/CenterTrack3/lib/python3.6/site-packages/torch/lib/include -I/home1/linazhan/.conda/envs/CenterTrack3/lib/python3.6/site-packages/torch/lib/include/TH -I/home1/linazhan/.conda/envs/CenterTrack3/lib/python3.6/site-packages/torch/lib/include/THC -I/spack/apps/linux-centos7-x86_64/gcc-4.9.4/cuda-9.2.88-nak6j4dtwls6r42eaqmpx5krncqhwrnh/include -I/home1/linazhan/.conda/envs/CenterTrack3/include/python3.6m -c /home1/linazhan/CenterTrack/src/lib/model/networks/DCNv2/src/vision.cpp -o build/temp.linux-x86_64-3.6/home1/linazhan/CenterTrack/src/lib/model/networks/DCNv2/src/vision.o -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ In file included from /home1/linazhan/CenterTrack/src/lib/model/networks/DCNv2/src/dcn_v2.h:3:0, from /home1/linazhan/CenterTrack/src/lib/model/networks/DCNv2/src/vision.cpp:2: /home1/linazhan/CenterTrack/src/lib/model/networks/DCNv2/src/cpu/vision.h:2:29: fatal error: torch/extension.h: No such file or directory #include <torch/extension.h>

I search google which says the pytorch version too low.

CedrusLNZ avatar Jun 08 '23 00:06 CedrusLNZ

It's cuda version problem. Use cuda9 and pytorch 0.4.1 will fix it.

@guanxiongsun when I use cuda9 and pytorch 0.4.1 to compile dcnv2, it reports/home1/linazhan/CenterTrack/src/lib/model/networks/DCNv2/src/cpu/vision.h:2:29: fatal error: torch/extension.h: No such file or directory #include <torch/extension.h>

CedrusLNZ avatar Jun 08 '23 00:06 CedrusLNZ