rf-detr
rf-detr copied to clipboard
CUDA error: device-side assert triggered
File”…/1ib/pvthon3,11/site-packages/torch/functional.py”, line 1335, in cdist return _VF.cdist(x1, x2, p, None) #type: ignore(attr-defined) Runtimeprror: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA LAUNCH BLOCKING=1. Compile with ‘TORCH_USE_CUDA_DSA’ to enable device-side assertions.
@Wangwang99999 could you share the annotations file you're using? Just the JSON.
_annotations.coco.json I modified line 47 in detr.py as: class_names = [c["name"] for c in anns["categories"]]
Can you check that your categories are 0-indexed? I did 1-indexing and that was messing things up for me. I switched to 0-indexing and that worked. Btw to debug these type of things, run your code on CPU as the errors will be a lot clearer.
_annotations.coco.json I modified line 47 in detr.py as: class_names = [c["name"] for c in anns["categories"]]
The ID for "categories" in my JSON starts from 0, and I have also modified the line of code in dert.py where class_name=[c ["name"] for c in anns ["categories"]]. However, when I run train.by, I still get an error: return _VF.cdist (x1, x2, p, None) # type: ignore [attr-defined]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
How did you solve it?
Please run it on CPU you will have a better idea of what's going on. And my comment was actually not 100% correct in the end. I had to add a dummy class at index 0 and then my other classes from 1 onward. Only then everything worked also the inference.
@ThierryDeruyttere Thanks, that fixed the issue for me.
I have the same problem? Please help me fix it
it's likely a problem with the annotation file having the wrong number of classes. roboflow datasets assume a background class 0 and 1-indexed real classes