simple-faster-rcnn-pytorch
simple-faster-rcnn-pytorch copied to clipboard
About RPN
Thank you for your contribution, can I extract the RPN module separately?Remove Faster RCNN classification steps?
I think it would be ok if you only want to use it in typical scene (e.g. binary classification).
ok,Thanks
I want to detect one category, what code needs to be changed?
I think the easiest way is to rewrite your dataset. Keep RoIHead and RPN unmodified.
好的,谢谢了,最近买了大神你写的<<深度学习框架PyTorch入门与实践>>那本书在学习,刚好用这个项目练练手,遇到问题还得向你多多请教
:smile:
我想把RPN分离出来只做二分类的,但是执行到下面一句出了问题,能不能帮我分析一下,谢谢了 mean = t.Tensor(self.loc_normalize_mean).cuda().repeat(self.n_class)[None] std = t.Tensor(self.loc_normalize_std).cuda().repeat(self.n_class)[None] roi_cls_loc = (roi_cls_loc * std + mean) mean和std都是1x8的,roi_cls_loc是1x16650x4的
mean和std 应该是1x4的
但是我把代码改成这样,这行代码roi = roi.view(-1, 1, 4).expand_as(roi_cls_loc)
报的这样的错误RuntimeError: The expanded size of the tensor (8325) must match the existing size (300) at non-singleton dimension 0.
mean = t.Tensor(self.loc_normalize_mean).cuda().repeat(self.n_class-1)[None]
std = t.Tensor(self.loc_normalize_std).cuda().repeat(self.n_class-1)[None]
roi_cls_loc = (roi_cls_loc * std + mean)
roi_cls_loc = roi_cls_loc.view(-1, self.n_class, 4)
roi = roi.view(-1, 1, 4).expand_as(roi_cls_loc)
需要打印一下 roi_cls_loc 和 roi 的形状分析一下.
roi是300x4,roi_cls_loc是8325x2x4
roi应该是300x1x4,roi_cls_loc应该是300x2x4
是不是其它地方弄错了?
好的,我再看看,谢谢了
@chenyuntc Do you mean if we have a binary classification then we keep the RPN but then can chop off the ROIHead (since conditional on a box containing an object it will always be object-0)? To improve speed.
If It's only a Binary classification, You could use RPN without ROIHEAD. But it's better to keep the ROIHEad, since the RoIPooling Crop the corresponding area from the input images. Anchor calculated in RPN only have one corresponding pixel in the feature map, which means that it may have too narrow receptive fields.
@chenyuntc Interesting, thank you. Just to play, I have removed the head and now loss is just:
losses = [rpn_loc_loss, rpn_cls_loss]
However, the issue comes with the scores for RPN:
def try_rpn_prediction(img_id):
# Eval model
trainer.faster_rcnn.eval()
# Forward pass through RPN only
_rpn_locs, rpn_scores, rois, _roi_indices, _anchors, roi_order = trainer.faster_rcnn(img_id)
rpn_scores = at.tonumpy(F.softmax(at.tovariable(np.squeeze(rpn_scores.data)), dim=1))
rpn_scores = np.squeeze(rpn_scores)[:,1] # fore-ground class
# Return the RPN scores only for relevant ROIs (i.e. bounding boxes)
return rois, rpn_scores[roi_order]
The roi_order variable I add to ProposalCreator class to keep track of which scores apply to which ROI:
def __call__(self, loc, score,
if self.parent_model.training:
n_pre_nms = self.n_train_pre_nms
n_post_nms = self.n_train_post_nms
else:
n_pre_nms = self.n_test_pre_nms
n_post_nms = self.n_test_post_nms
# Convert anchors into proposal via bbox transformations.
# roi = loc2bbox(anchor, loc)
roi = loc2bbox(anchor, loc)
# Clip predicted boxes to image.
roi[:, slice(0, 4, 2)] = np.clip(
roi[:, slice(0, 4, 2)], 0, img_size[0])
roi[:, slice(1, 4, 2)] = np.clip(
roi[:, slice(1, 4, 2)], 0, img_size[1])
# Remove predicted boxes with either height or width < threshold.
min_size = self.min_size * scale
hs = roi[:, 2] - roi[:, 0]
ws = roi[:, 3] - roi[:, 1]
keep = np.where((hs >= min_size) & (ws >= min_size))[0]
roi = roi[keep, :]
score = score[keep]
roi_index = keep
# Sort all (proposal, score) pairs by score from highest to lowest.
# Take top pre_nms_topN (e.g. 6000).
order = score.ravel().argsort()[::-1]
if n_pre_nms > 0:
order = order[:n_pre_nms]
roi = roi[order, :]
# Edit: apply to roi_index
roi_index = roi_index[order]
# Apply nms (e.g. threshold = 0.7).
# Take after_nms_topN (e.g. 300).
# unNOTE: somthing is wrong here!
# TODO: remove cuda.to_gpu
keep = non_maximum_suppression(
cp.ascontiguousarray(cp.asarray(roi)),
thresh=self.nms_thresh)
if n_post_nms > 0:
keep = keep[:n_post_nms]
roi = roi[keep]
# Edit: apply to roi_index
roi_index = roi_index[keep]
return roi, roi_index
However, this still doesn't quite match up. For example the rois are all sorted from most likely to least. However, indexing by this order variable I still notice some entries below have higher probability (although very few).
Related to your last part you mean the classification accuracy for one-class would be improved for big objects if ROIHEAD is included?
Thanks very much!
Sorry just to ask further about your comment:
But it's better to keep the ROIHEad, since the RoIPooling Crop the corresponding area from the input images. Anchor calculated in RPN only have one corresponding pixel in the feature map, which means that it may have too narrow receptive fields.
I don't quite understand that. I am using ResNet50 with your code which has a stride of 224/7=32 so one pixel on feature-map is 32 pixels in the image. If I use just RPN can I compensate by adding bigger/smaller anchor sizes?
@chenyuntc Sorry to triple post but just wanted to bring this quote up from Kaiming
More importantly, RPN as a stand-alone pedestrian detector achieves an MR of 14.9% (Table 1). This result is competitive and is better than all but two stateof-the-art competitors on the Caltech dataset (Fig. 4). We note that unlike RoI pooling that may suffer from small regions, RPN is essentially based on fixedsize sliding windows (in a fully convolutional fashion) and thus avoids collapsing bins. RPN predicts small objects by using small anchors
It seems in their case for 1-class, RPN only was even better?
The author is missing F.softmax for rpn scores. So the author used for ranking rpn is actually some raw numbers, not the expected scores
As a two-stage model, Faster RCNN can train RPN and ROI individually using roi_cls_locs、roi_scores、rpn_locs, rpn_scores. But in your code,you only use the roi_cls_locs and roi_scores to train the ROI head,and not train the RPN.AM I right?