DINO icon indicating copy to clipboard operation
DINO copied to clipboard

train error

Open kkkkkk123-ops opened this issue 11 months ago • 4 comments

when i use bash scripts/DINO_train.sh /path/to/your/COCODIR to train the model, there's the following error. Traceback (most recent call last): File "main.py", line 395, in main(args) File "main.py", line 280, in main train_stats = train_one_epoch( File "/root/onethingai-tmp/plaque_detection/DINO-main/engine.py", line 52, in train_one_epoch loss_dict = criterion(outputs, targets) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/onethingai-tmp/plaque_detection/DINO-main/models/dino/dino.py", line 569, in forward indices = self.matcher(outputs_without_aux, targets)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/onethingai-tmp/plaque_detection/DINO-main/models/dino/matcher.py", line 84, in forward cost_bbox = torch.cdist(out_bbox, tgt_bbox, p=1) File "/root/miniconda3/lib/python3.8/site-packages/torch/functional.py", line 1222, in cdist return _VF.cdist(x1, x2, p, None) # type: ignore[attr-defined] RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

kkkkkk123-ops avatar Mar 20 '24 07:03 kkkkkk123-ops