RelTR RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA

when i want to trian RelTR on Open Images V6 with a single GPU python main.py --dataset oi --img_folder /home/ybz/RelTR/data/oi/images/ --ann_path /home/ybz/RelTR/data/ --batch_size 1 --output_dir ckpt1

Dec 06 '23 14:12 YubeiZheng

Hi,

If you haven't changed any of the code, you might be using an incompatible version of cuda. Please make sure your cuda version matches your GPU device. For example, RTX4090 with CUDA>11.1.

Dec 08 '23 18:12 yrcong

同样的问题，coco的数据加载会过滤左上角大于等于右下角的box，导致一些box被删除了。但是relation里面box的索引还在，就导致了cuda的越界警告。使用作者目前给的标注会有这个问题。

May 21 '24 05:05 Dreamer312

同样的问题，coco的数据加载会过滤左上角大于等于右下角的box，导致一些box被删除了。但是relation里面box的索引还在，就导致了cuda的越界警告。使用作者目前给的标注会有这个问题。

wow，我不知道这个情况，是由于pycocotools代码改变引起的吗？

May 21 '24 14:05 yrcong

keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])这里把一些box删了，很奇怪，至少现在我直接clone你的代码不改的情况下，open image是跑不起来的

`class ConvertCocoPolysToMask(object): def init(self, return_masks=False): self.return_masks = return_masks

def __call__(self, image, target):
    w, h = image.size

    image_id = target["image_id"]
    image_id = torch.tensor([image_id])

    anno = target["annotations"]

    anno = [obj for obj in anno if 'iscrowd' not in obj or obj['iscrowd'] == 0]

    boxes = [obj["bbox"] for obj in anno]
    # guard against no boxes via resizing
    boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)
    boxes[:, 2:] += boxes[:, :2]
    boxes[:, 0::2].clamp_(min=0, max=w)
    boxes[:, 1::2].clamp_(min=0, max=h)

    classes = [obj["category_id"] for obj in anno]
    classes = torch.tensor(classes, dtype=torch.int64)

    if self.return_masks:
        segmentations = [obj["segmentation"] for obj in anno]
        masks = convert_coco_poly_to_mask(segmentations, h, w)

    keypoints = None
    if anno and "keypoints" in anno[0]:
        keypoints = [obj["keypoints"] for obj in anno]
        keypoints = torch.as_tensor(keypoints, dtype=torch.float32)
        num_keypoints = keypoints.shape[0]
        if num_keypoints:
            keypoints = keypoints.view(num_keypoints, -1, 3)

    keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])
    boxes = boxes[keep]
    classes = classes[keep]
    if self.return_masks:
        masks = masks[keep]
    if keypoints is not None:
        keypoints = keypoints[keep]

    # TODO add relation gt in the target
    rel_annotations = target['rel_annotations']

    target = {}
    target["boxes"] = boxes
    target["labels"] = classes
    if self.return_masks:
        target["masks"] = masks
    target["image_id"] = image_id
    if keypoints is not None:
        target["keypoints"] = keypoints

    # for conversion to coco api
    area = torch.tensor([obj["area"] for obj in anno])
    iscrowd = torch.tensor([obj["iscrowd"] if "iscrowd" in obj else 0 for obj in anno])
    target["area"] = area[keep]
    target["iscrowd"] = iscrowd[keep]

    target["orig_size"] = torch.as_tensor([int(h), int(w)])
    target["size"] = torch.as_tensor([int(h), int(w)])
    # TODO add relation gt in the target
    target['rel_annotations'] = torch.tensor(rel_annotations)

    return image, target`

May 22 '24 01:05 Dreamer312

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.