SparseR-CNN icon indicating copy to clipboard operation
SparseR-CNN copied to clipboard

Question regarding proposal feature

Open abeyang00 opened this issue 3 years ago • 4 comments

I have a question regarding proposal feature.

In DETR paper, reshaped feature map (HW x C) is given as input to transformer encoder to learn correlation between each pixels. However, in your paper, you use C size vector (named 'prop_feats') instead of reshaped feature map.

How does this C size vector learn the correlation among each pixels? In my understanding this does not contain the feature information for each pixel position.

I saw your reply in one of the previous issues where you replied 'don't understand dynamic head as Q,K,V'. How should i understand this concept then??

Thank you in advance!

abeyang00 avatar Mar 10 '21 05:03 abeyang00

Hi~ The proposal features contains information about its corresponding object. The proposal feature updates itself by interacting with RoI feature. We don't need feature information for each pixel position.

PeizeSun avatar Mar 10 '21 15:03 PeizeSun

so roi feature can be regarded as Query and proposal features as Key?

abeyang00 avatar Mar 11 '21 01:03 abeyang00

I guess Query is proposal features [100 x C], roi feature is Key [100 x (7 x 7 x C)] . Think about DETR, Query is object query [100 x C], Key is 100 times reshaped image feature map [100 x (HW x C)], where each (HW x C) is the same.

PeizeSun avatar Mar 11 '21 02:03 PeizeSun

@PeizeSun Isn't that Q and K must have the same hidden dimension to process matrix multiplication, like in DETR Q is [100 x C] and K is [HW x C] instead of [100 x (HWC)]?

HYUNJS avatar Mar 11 '21 21:03 HYUNJS