RT-DETR icon indicating copy to clipboard operation
RT-DETR copied to clipboard

about "Uncertainty-minimal Query Selection"

Open 1999zsh opened this issue 1 year ago • 5 comments

Star RTDETR 请先在RTDETR主页点击star以支持本项目 Star RTDETR to help more people discover this project.


Describe the bug A clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem.

To Reproduce Steps to reproduce the behavior.

Dear author, thank you for your briliant work! I noticed that the paper mentions a method called 'Uncertainty-minimal Query Selection,' but I couldn't find it in the code provided. Could you please clarify where this method is implemented or if it might be missing

1999zsh avatar Aug 29 '24 02:08 1999zsh

I had a same question after going through the code base. I figured out that it is implemented as vfl_loss in criterion.

r4hul77 avatar Aug 30 '24 06:08 r4hul77

I had a same question after going through the code base. I figured out that it is implemented as vfl_loss in criterion.

Could you please show me the related code in vfl_loss? Thank you!

1999zsh avatar Aug 30 '24 06:08 1999zsh

I don't know how to link this so I'm gonna copy paste code.

def loss_labels_vfl(self, outputs, targets, indices, num_boxes, log=True): assert 'pred_boxes' in outputs idx = self._get_src_permutation_idx(indices)

    src_boxes = outputs['pred_boxes'][idx]
    target_boxes = torch.cat([t['boxes'][i] for t, (_, i) in zip(targets, indices)], dim=0)
    ious, _ = box_iou(box_cxcywh_to_xyxy(src_boxes), box_cxcywh_to_xyxy(target_boxes))
    ious = torch.diag(ious).detach()

    src_logits = outputs['pred_logits']
    target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
    target_classes = torch.full(src_logits.shape[:2], self.num_classes,
                                dtype=torch.int64, device=src_logits.device)
    target_classes[idx] = target_classes_o
    target = F.one_hot(target_classes, num_classes=self.num_classes + 1)[..., :-1]

    target_score_o = torch.zeros_like(target_classes, dtype=src_logits.dtype)
    target_score_o[idx] = ious.to(target_score_o.dtype)
    target_score = target_score_o.unsqueeze(-1) * target

    pred_score = F.sigmoid(src_logits).detach()
    weight = self.alpha * pred_score.pow(self.gamma) * (1 - target) + target_score
    
    loss = F.binary_cross_entropy_with_logits(src_logits, target_score, weight=weight, reduction='none')
    loss = loss.mean(1).sum() * src_logits.shape[1] / num_boxes
    return {'loss_vfl': loss}

Search for 'rtdetr_criterion.py'

r4hul77 avatar Aug 30 '24 06:08 r4hul77

I don't know how to link this so I'm gonna copy paste code.

def loss_labels_vfl(self, outputs, targets, indices, num_boxes, log=True): assert 'pred_boxes' in outputs idx = self._get_src_permutation_idx(indices)

    src_boxes = outputs['pred_boxes'][idx]
    target_boxes = torch.cat([t['boxes'][i] for t, (_, i) in zip(targets, indices)], dim=0)
    ious, _ = box_iou(box_cxcywh_to_xyxy(src_boxes), box_cxcywh_to_xyxy(target_boxes))
    ious = torch.diag(ious).detach()

    src_logits = outputs['pred_logits']
    target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
    target_classes = torch.full(src_logits.shape[:2], self.num_classes,
                                dtype=torch.int64, device=src_logits.device)
    target_classes[idx] = target_classes_o
    target = F.one_hot(target_classes, num_classes=self.num_classes + 1)[..., :-1]

    target_score_o = torch.zeros_like(target_classes, dtype=src_logits.dtype)
    target_score_o[idx] = ious.to(target_score_o.dtype)
    target_score = target_score_o.unsqueeze(-1) * target

    pred_score = F.sigmoid(src_logits).detach()
    weight = self.alpha * pred_score.pow(self.gamma) * (1 - target) + target_score
    
    loss = F.binary_cross_entropy_with_logits(src_logits, target_score, weight=weight, reduction='none')
    loss = loss.mean(1).sum() * src_logits.shape[1] / num_boxes
    return {'loss_vfl': loss}

Search for 'rtdetr_criterion.py'

sorry to bother you again. I don't see the related code about the eqution (2) in the paper. I notice that eqution (2) is L2 distance between localization and classification.

1999zsh avatar Aug 30 '24 08:08 1999zsh

I have the same question about "Uncertainty-minimal Query Selection". I think the code you quote may come from VFocalLoss, which is a focal loss mixed with IoU. The raw paper I refer is called "VarifocalNet: an IoU-aware dense object detector", from which I can find the definition of VFL. $VFL(p,q)=-q[q\log p+(1-q)\log(1-p)]$ when q > 0, $VFL(p,q)=-\alpha p^\gamma\log(1-p)$ when q=0, q is the IoU while p is the predict probability.

I don't know how to link this so I'm gonna copy paste code. def loss_labels_vfl(self, outputs, targets, indices, num_boxes, log=True): assert 'pred_boxes' in outputs idx = self._get_src_permutation_idx(indices)

    src_boxes = outputs['pred_boxes'][idx]
    target_boxes = torch.cat([t['boxes'][i] for t, (_, i) in zip(targets, indices)], dim=0)
    ious, _ = box_iou(box_cxcywh_to_xyxy(src_boxes), box_cxcywh_to_xyxy(target_boxes))
    ious = torch.diag(ious).detach()

    src_logits = outputs['pred_logits']
    target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
    target_classes = torch.full(src_logits.shape[:2], self.num_classes,
                                dtype=torch.int64, device=src_logits.device)
    target_classes[idx] = target_classes_o
    target = F.one_hot(target_classes, num_classes=self.num_classes + 1)[..., :-1]

    target_score_o = torch.zeros_like(target_classes, dtype=src_logits.dtype)
    target_score_o[idx] = ious.to(target_score_o.dtype)
    target_score = target_score_o.unsqueeze(-1) * target

    pred_score = F.sigmoid(src_logits).detach()
    weight = self.alpha * pred_score.pow(self.gamma) * (1 - target) + target_score
    
    loss = F.binary_cross_entropy_with_logits(src_logits, target_score, weight=weight, reduction='none')
    loss = loss.mean(1).sum() * src_logits.shape[1] / num_boxes
    return {'loss_vfl': loss}

Search for 'rtdetr_criterion.py'

sorry to bother you again. I don't see the related code about the eqution (2) in the paper. I notice that eqution (2) is L2 distance between localization and classification.

BugBubbles avatar Mar 31 '25 13:03 BugBubbles