Collaborative-Learning-for-Weakly-Supervised-Object-Detection
Collaborative-Learning-for-Weakly-Supervised-Object-Detection copied to clipboard
Implementation difference with the published paper
We have read the source code and found there are some details that are not same as the published paper:
- In the code, we find the proposals after NMS within top 50 scores are used as the pseudo ground truths for Faster-RCNN,
def generate_pseudo_gtbox(boxes, cls_prob, im_labels):
"""Get proposals from fuse_matrix
inputs are all variables"""
pre_nms_topN = 50
nms_Thresh = 0.1
num_images, num_classes = im_labels.size()
boxes = boxes[:,1:]
assert num_images == 1, 'batch size shoud be equal to 1'
im_labels_tmp = im_labels[0, :]
labelList = im_labels_tmp.data.nonzero().view(-1)
gt_boxes = []
gt_classes = []
gt_scores = []
for i in labelList:
scores, order = cls_prob[:,i].contiguous().view(-1).sort(descending=True)
if pre_nms_topN > 0:
order = order[:pre_nms_topN]
scores = scores[:pre_nms_topN].view(-1, 1)
proposals = boxes[order.data, :]
keep = nms(torch.cat((proposals, scores), 1).data, nms_Thresh)
proposals = proposals[keep, :]
scores = scores[keep,]
gt_boxes.append(proposals)
gt_classes.append(torch.ones(keep.size(0),1)*(i+1)) # return idx=class+1 to include the background
gt_scores.append(scores.view(-1,1))
gt_boxes = torch.cat(gt_boxes)
gt_classes = torch.cat(gt_classes)
gt_scores = torch.cat(gt_scores)
proposals = {'gt_boxes' : gt_boxes,
'gt_classes': gt_classes,
'gt_scores': gt_scores}
while the paper uses the top scoring box.
Max-out Strategy The predictions of DS and DW could be inaccurate, especially in the initial rounds of training. For measuring the prediction consistency, it is important to select only the most confident predictions. We thus apply a Max-out strategy to filter out most predictions. For each positive category, only the region with highest prediction score by DW is chosen.
- Also, the bootstrap_cross_entropy method is different with that in the paper when the cfg.TRAIN.ISHARD is set True, which is the case when we run the code without any modification.
if ishard:
_,idx = input_prob.max(1)
target_onehot = target_onehot * beta + \
Variable(input.data.new(input.data.size()).zero_()).scatter_(1, idx.view(-1,1), 1) * (1-beta)
else:
target_onehot = target_onehot * beta + input_prob * (1-beta)
We have run the code and get mAP 47.0 on VOC2007 test dataset, inferior than the mAP 48.2 posted in the paper. We wonder if the reason is the 2 differences. Will you please explain the 2 differences and tell us how to reproduce similar results as in the paper? Thanks!
Hi,thanks for your issue. In our paper, We find the results from Max-out Strategy
give the best performances, but we also tried top 5~50 max proposals for experiments. We also tried The bootstrap_cross_entropy method
for hard
or soft
as seen in the codes, but two methods have little differences at model testing time.
For reproducing our results as attached here, you can try to adjust pre_nms_topN
to 1 , but the inferior results may come from the other machine problems.