GroundingDINO
GroundingDINO copied to clipboard
Unable to run on GPU
Hi, thanks for releasing the code.
I have followed the instruction to set CUDA_HOME variable and successfully installed groundingdino. However, I still get the following warning and error when I run the demo script.
/gpfs/u/home/DFLM/DFLMshcg/yujian/rl_scheduler/detector/groundingdino/models/GroundingDINO/ms_deform_attn.py:31: UserWarning: Failed to load custom C++ ops. Running on CPU mode Only!
Traceback (most recent call last):
File "/gpfs/u/home/DFLM/DFLMshcg/yujian/rl_scheduler/detector/inference.py", line 160, in <module>
boxes_filt, pred_phrases = get_grounding_output(
File "/gpfs/u/home/DFLM/DFLMshcg/yujian/rl_scheduler/detector/inference.py", line 91, in get_grounding_output
outputs = model(image[None], captions=[caption])
File "/gpfs/u/home/DFLM/DFLMshcg/scratch/miniconda3-x86/envs/cleanrl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/gpfs/u/home/DFLM/DFLMshcg/yujian/rl_scheduler/detector/groundingdino/models/GroundingDINO/groundingdino.py", line 313, in forward
hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
File "/gpfs/u/home/DFLM/DFLMshcg/scratch/miniconda3-x86/envs/cleanrl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/gpfs/u/home/DFLM/DFLMshcg/yujian/rl_scheduler/detector/groundingdino/models/GroundingDINO/transformer.py", line 258, in forward
memory, memory_text = self.encoder(
File "/gpfs/u/home/DFLM/DFLMshcg/scratch/miniconda3-x86/envs/cleanrl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/gpfs/u/home/DFLM/DFLMshcg/yujian/rl_scheduler/detector/groundingdino/models/GroundingDINO/transformer.py", line 576, in forward
output = checkpoint.checkpoint(
File "/gpfs/u/home/DFLM/DFLMshcg/scratch/miniconda3-x86/envs/cleanrl/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/gpfs/u/home/DFLM/DFLMshcg/scratch/miniconda3-x86/envs/cleanrl/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/gpfs/u/home/DFLM/DFLMshcg/scratch/miniconda3-x86/envs/cleanrl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/gpfs/u/home/DFLM/DFLMshcg/yujian/rl_scheduler/detector/groundingdino/models/GroundingDINO/transformer.py", line 785, in forward
src2 = self.self_attn(
File "/gpfs/u/home/DFLM/DFLMshcg/scratch/miniconda3-x86/envs/cleanrl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/gpfs/u/home/DFLM/DFLMshcg/yujian/rl_scheduler/detector/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 338, in forward
output = MultiScaleDeformableAttnFunction.apply(
File "/gpfs/u/home/DFLM/DFLMshcg/yujian/rl_scheduler/detector/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 53, in forward
output = _C.ms_deform_attn_forward(
NameError: name '_C' is not defined
Here is my environment info:
python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: CentOS Linux release 7.9.2009 (Core) (x86_64)
GCC version: (Anaconda gcc) 11.2.0
Clang version: Could not collect
CMake version: version 3.26.3
Libc version: glibc-2.17
Python version: 3.9.16 (main, Mar 8 2023, 14:00:05) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.59.1.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.7.64
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
GPU 4: Tesla V100-SXM2-32GB
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.13.0+cu117
[pip3] torchaudio==0.13.0+cu117
[pip3] torchvision==0.14.0+cu117
[conda] numpy 1.21.6 pypi_0 pypi
[conda] torch 1.13.0+cu117 pypi_0 pypi
[conda] torchaudio 0.13.0+cu117 pypi_0 pypi
[conda] torchvision 0.14.0+cu117 pypi_0 pypi
I wonder what could be the reason that causes this error. Many thanks in advance!
I solved it by running the following:
python setup.py build develop --user
@delima87 Thanks! This solves the error.
However, I find that if I run the code on GPU, the model will detect nothing. The same code works fine if I run on CPU, but if I switch to GPU, the output logits become very small and the model can't detect anything.
I wonder if you encounter the same error.
Following up on this thread. I found it's specifically a V100 dependent problem as I cannot replicate the error on other GPU types. Any fix that can also let it run on GPU?
Same problem, Have you already solved it? @yujianll