Collaborative-Learning-for-Weakly-Supervised-Object-Detection icon indicating copy to clipboard operation
Collaborative-Learning-for-Weakly-Supervised-Object-Detection copied to clipboard

Implementation difference with the published paper

Open liubx07 opened this issue 5 years ago • 1 comments

We have read the source code and found there are some details that are not same as the published paper:

  1. In the code, we find the proposals after NMS within top 50 scores are used as the pseudo ground truths for Faster-RCNN,
def generate_pseudo_gtbox(boxes, cls_prob, im_labels):
    """Get proposals from fuse_matrix
    inputs are all variables"""
    pre_nms_topN = 50
    nms_Thresh = 0.1
    
    num_images, num_classes = im_labels.size()
    boxes = boxes[:,1:]
    assert num_images == 1, 'batch size shoud be equal to 1'
    im_labels_tmp = im_labels[0, :]
    labelList = im_labels_tmp.data.nonzero().view(-1)
    
    gt_boxes = []
    gt_classes = []
    gt_scores = []
    
    for i in labelList:
        scores, order = cls_prob[:,i].contiguous().view(-1).sort(descending=True)
        if pre_nms_topN > 0:
          order = order[:pre_nms_topN]
          scores = scores[:pre_nms_topN].view(-1, 1)
          proposals = boxes[order.data, :]
          
        keep = nms(torch.cat((proposals, scores), 1).data, nms_Thresh)
        proposals = proposals[keep, :]
        scores = scores[keep,]
        gt_boxes.append(proposals)
        gt_classes.append(torch.ones(keep.size(0),1)*(i+1))  # return idx=class+1 to include the background
        gt_scores.append(scores.view(-1,1))
            
    gt_boxes = torch.cat(gt_boxes)
    gt_classes = torch.cat(gt_classes)
    gt_scores = torch.cat(gt_scores)
    proposals = {'gt_boxes' : gt_boxes,
                 'gt_classes': gt_classes,
                 'gt_scores': gt_scores}

while the paper uses the top scoring box.

Max-out Strategy The predictions of DS and DW could be inaccurate, especially in the initial rounds of training. For measuring the prediction consistency, it is important to select only the most confident predictions. We thus apply a Max-out strategy to filter out most predictions. For each positive category, only the region with highest prediction score by DW is chosen.

  1. Also, the bootstrap_cross_entropy method is different with that in the paper when the cfg.TRAIN.ISHARD is set True, which is the case when we run the code without any modification.
if ishard:
        _,idx = input_prob.max(1)
        target_onehot = target_onehot * beta + \
                    Variable(input.data.new(input.data.size()).zero_()).scatter_(1, idx.view(-1,1), 1) * (1-beta)
    else:
        target_onehot = target_onehot * beta + input_prob * (1-beta)

We have run the code and get mAP 47.0 on VOC2007 test dataset, inferior than the mAP 48.2 posted in the paper. We wonder if the reason is the 2 differences. Will you please explain the 2 differences and tell us how to reproduce similar results as in the paper? Thanks!

liubx07 avatar Mar 10 '19 03:03 liubx07

Hi,thanks for your issue. In our paper, We find the results from Max-out Strategy give the best performances, but we also tried top 5~50 max proposals for experiments. We also tried The bootstrap_cross_entropy method for hard or soft as seen in the codes, but two methods have little differences at model testing time. For reproducing our results as attached here, you can try to adjust pre_nms_topN to 1 , but the inferior results may come from the other machine problems.

ww1024cc avatar Mar 17 '19 07:03 ww1024cc