MaskDINO Question about the segmentation branch

Hi,

Nice work. I see that you use the highest resolution backbone feature map and encoder feature map to generate the pixel embedding map. Did you try including other feature maps with lower resolution (backbone or encoder) and find any increase in performance?

Thanks, Owen

https://github.com/IDEA-Research/MaskDINO/blob/76c8e4536ad8f01ed97f71fe47dd05518b5dbdaf/maskdino/modeling/pixel_decoder/maskdino_encoder.py#L415-L428

Jan 12 '23 15:01 owen24819

Yes. We use a 1/8 map in the encoder by default. The biggest map we use is 1/4 of the encoder (refer to our 5-scale model). It can improve the performance by around 0.5 AP.

Jan 15 '23 02:01 FengLi-ust

Hi, thanks for the response. I was specifically wondering if you fed multiple encoder feature maps to the segmentation head. e.g. fed the 1/4, 1/8 and 1/16 encoder maps to the segmentation head. In the code I highlighted above, it seems like it was written as if you maybe tried this.

Jan 16 '23 19:01 owen24819