MultiNet icon indicating copy to clipboard operation
MultiNet copied to clipboard

I do not understand the number in paper.(Convolution and concatenated)

Open rachel1994 opened this issue 7 years ago • 2 comments

When I read a thesis, I do not understand it.

First, for classification, in the initial paper of 2016, 1x1 convolution was used, but in the 2018 paper (Figure2), 3x3 convolution was used. Are there any spacial reasons for changing numbers? In addition, at the top of page 4, it is more confusing to say that 'we first apply a 1x1 convolution with 30 channels'. I wonder what number is correct.

Second, I want to see why the concated fetures in the Detection Decoder in Figure2 of the 2018 paper are expressed as 39x12x1526. According to my calculations, ROI Aligh 128 channels is concatenated with 128*8=1024.(+ I wonder why I see 8 instead of 9, except for the existing results in the middle), 500 channels in the Bottleneck block, and finally Prediction 6 channels are concatenated, so the final result is supposed to be 1024+500+6=1530. I will be very grateful if you let me know if I have the wrong part. I have been thinking about this number for a long time, but there is no other conclusion.

I look forward to your reply. Thank you.

rachel1994 avatar Jul 16 '18 01:07 rachel1994

I have the same doubt. Have you sloved this problem?

zhoupan9109 avatar Oct 12 '21 09:10 zhoupan9109

Great Job! But I still have a question. I can't understand the number, too. According to the paper, features transform from (156×48×128) to (39×12×1020) using ROI Align. And I feel confused about this step. If you could expand on it, I would appreciate it. Thanks a lot in advance.

HerrYu123 avatar Jan 29 '22 03:01 HerrYu123