neural-motifs
neural-motifs copied to clipboard
RuntimeError: cuda runtime error (8) in nms
Hi, I have encounted the following error when I run the code on two Titan XP:
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518238409320/work/torch/lib/THC/generic/THCTensorMathPairwise.cu line=21 error=8 : invalid device function
Traceback (most recent call last):
File "/home/wtliao/.pycharm_helpers/pydev/pydevd.py", line 1668, in
After debug line by line, I find that this error arises in the operation: keep.append(keep_im + s), line 24 in nms.py
Any idea to solve it? Thanks!
eventhrough I try to use single GPU, I have the same issue
I'm not entirely sure what's going on here, but it seems like you're using Python 2, which I don't support with this repo. have you tried using Python 3?
@rowanz when use py3.6., the error is:
RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/generic/THCTensorMathPairwise.cu:21
I solved it in a strang way:
try:
keep.append(keep_im + s)
except BaseException:
keep.append(keep_im + s)
which means to operate it twice and it works..... I has no idea why.
Now I solved this problem by recompile the nms files using
#!/usr/bin/env bash
# CUDA_PATH=/usr/local/cuda/
cd src/cuda
echo "Compiling stnn kernels by nvcc..."
nvcc -c -o nms.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52
cd ../../
python build.py
But a new problem arises in
feature_pool = RoIAlignFunction(self.pooling_size, self.pooling_size, spatial_scale=1 / 16)(
self.compress(features) if self.use_resnet else features, rois)
with error information
cudaCheckError() failed : no kernel image is available for execution on the device
Process finished with exit code 255
I can't figure out where the problem is. Do you have any idea about it? I fix it by replacing the roi_align file with this roi_align. Now the code can run through. If you can fix the original roi_align, it will be much better. Thanks for sharing your impressive work again
my guess is that you have a newer version of cuda than I did last year. Possibly you’d need to compile it with -arch=sm_61 ? Sorry for the difficulty anyways, I really wish pytorch had native roipooling (which they’re working on for v1)
On Mon, Oct 22, 2018 at 8:18 AM wtliao [email protected] wrote:
Now I solved this problem by recompile the nms files using
#!/usr/bin/env bash
CUDA_PATH=/usr/local/cuda/
cd src/cuda echo "Compiling stnn kernels by nvcc..." nvcc -c -o nms.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52 cd ../../ python build.py
But a new problem arises in
feature_pool = RoIAlignFunction(self.pooling_size, self.pooling_size, spatial_scale=1 / 16)( self.compress(features) if self.use_resnet else features, rois)
with error information
cudaCheckError() failed : no kernel image is available for execution on the device
Process finished with exit code 255
I can't figure out where the problem is. Do you have any idea about it? I fix it by replacing the roi_align file with this roi_align https://github.com/jwyang/faster-rcnn.pytorch/tree/master/lib/model/roi_align . Now the code can run through. If you can fix the original roi_align, it will be much better. Thanks for sharing your impressive work again
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/rowanz/neural-motifs/issues/33#issuecomment-431865986, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWJx2nPZNj5WbF95GdiIbc3mTDk4epXks5uneHbgaJpZM4XsrNn .
@rowanz I'm facing the exact same issue. But for me, doing the same steps as @wtliao suggested didn't get rid of the error. I'm using CUDA 8.0 and Tesla K80s. Therefore I even tried compiling with sm_37 in nms, roi_aling and highway_lstms. What would you advise me to do?
@wtliao could you find a fix for the error?
Hi, I solved the issues as I described above. I have tried the code on CUDA9.0+K40, CUDA9.0+P100, and CUDA8.0+TITAN XP, they all works now. So I guess, you can try to update to CUDA9.0. I can't fix the roi_aling issues in the author's code. I replace it with mine. BTW, I didn't compile the code using the Makefile provided by the autor. I compiled each part of the code one by one using my make.sh under the corresponding dir.
@rowanz I'm facing the exact same issue. But for me, doing the same steps as @wtliao suggested didn't get rid of the error. I'm using CUDA 8.0 and Tesla K80s. Therefore I even tried compiling with sm_37 in nms, roi_aling and highway_lstms. What would you advise me to do?
@wtliao could you find a fix for the error?
Thanks a lot @wtliao , I got it running now. And thanks a lot @rowanz for sharing your amazing work. One small doubt, can you please tell me the interpretation of pred_rel_inds part of the output (I believe it's a [NUM_PRED_RELS, 51] array with each pair, having scores for the 50 types of relationships. The 0th index is No relationship right? (Because the total number of relationships is 50)
@wtliao , I always show that bbox_overlaps can't be found in the process of running, but I have generated the. so file. Do you have any suggestions to help me?Thank you!
@wtliao , I always show that bbox_overlaps can't be found in the process of running, but I have generated the. so file. Do you have any suggestions to help me?Thank you!
you should run the command line "export PYTHONPATH=where/is/your/project/folder" before running.
Hello, thank you very much for your reply. I have already set this, but the same problem will still occur. Is it the problem that the. so file cannot be found? In addition, I always display: Runtime Error: cuda Runtime Error (2): Out of Memory. I have changed batch_size to 1, but the same problem still occurs. Do you have any good suggestions?
------------------ 原始邮件 ------------------ 发件人: "rowanz/neural-motifs" <[email protected]>; 发送时间: 2020年10月12日(星期一) 晚上8:16 收件人: "rowanz/neural-motifs"<[email protected]>; 抄送: "李建红"<[email protected]>;"Comment"<[email protected]>; 主题: Re: [rowanz/neural-motifs] RuntimeError: cuda runtime error (8) in nms (#33)
@wtliao , I always show that bbox_overlaps can't be found in the process of running, but I have generated the. so file. Do you have any suggestions to help me?Thank you!
you should run the command line "export PYTHONPATH=where/is/your/project/folder" before running.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.