rf-detr icon indicating copy to clipboard operation
rf-detr copied to clipboard

CUDA error: device-side assert triggered

Open Wangwang99999 opened this issue 6 months ago • 8 comments

File”…/1ib/pvthon3,11/site-packages/torch/functional.py”, line 1335, in cdist return _VF.cdist(x1, x2, p, None) #type: ignore(attr-defined) Runtimeprror: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA LAUNCH BLOCKING=1. Compile with ‘TORCH_USE_CUDA_DSA’ to enable device-side assertions.

Wangwang99999 avatar May 19 '25 07:05 Wangwang99999

@Wangwang99999 could you share the annotations file you're using? Just the JSON.

SkalskiP avatar May 19 '25 07:05 SkalskiP

_annotations.coco.json I modified line 47 in detr.py as: class_names = [c["name"] for c in anns["categories"]]

Wangwang99999 avatar May 19 '25 07:05 Wangwang99999

Can you check that your categories are 0-indexed? I did 1-indexing and that was messing things up for me. I switched to 0-indexing and that worked. Btw to debug these type of things, run your code on CPU as the errors will be a lot clearer.

ThierryDeruyttere avatar May 19 '25 09:05 ThierryDeruyttere

_annotations.coco.json I modified line 47 in detr.py as: class_names = [c["name"] for c in anns["categories"]]

The ID for "categories" in my JSON starts from 0, and I have also modified the line of code in dert.py where class_name=[c ["name"] for c in anns ["categories"]]. However, when I run train.by, I still get an error: return _VF.cdist (x1, x2, p, None) # type: ignore [attr-defined] RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

How did you solve it?

JackGUO-boy avatar May 20 '25 03:05 JackGUO-boy

Please run it on CPU you will have a better idea of what's going on. And my comment was actually not 100% correct in the end. I had to add a dummy class at index 0 and then my other classes from 1 onward. Only then everything worked also the inference.

ThierryDeruyttere avatar May 21 '25 14:05 ThierryDeruyttere

@ThierryDeruyttere Thanks, that fixed the issue for me.

J7779 avatar Jul 06 '25 02:07 J7779

I have the same problem? Please help me fix it

tuanba-ht avatar Oct 03 '25 13:10 tuanba-ht

it's likely a problem with the annotation file having the wrong number of classes. roboflow datasets assume a background class 0 and 1-indexed real classes

isaacrob-roboflow avatar Oct 03 '25 15:10 isaacrob-roboflow