MaskDINO
MaskDINO copied to clipboard
questions about Mask DINO for semantic segmentation
I have seen the Issue #1, and gone through the codebase of Maskformer and Mask2former. I 'm interested in Mask DINO for semantic seg too, and got some questions.
- Is the mask attention in Mask2former kept in Mask DINO ?
- Is the semantic map converted into instance mask and bbox for preprocessing, and calculate the loss of converted annotation? For example, for a semantic map F:{0,1,2,3,4} in which there are 5 classes, The map is converted into 5 classes i:0~4 bin masks Fi:{0,1} and bboxs Bi. Hence, loss_cls, loss_bbox, loss_mask are calculated with the three converted annotations. Is this right?
- I think the biggest differences between Mask DINO model and Mask2former model is that, the decoder of DINO requires anchor boxes generated from encoder and content queries embeddings as input, while the decoder of Mask2former is like that of the original DETR, which requires learnable embeddings for both content and spatial queries. Although there is not clear bboxs for semantic segmentation, the encoder of DINO still generates a set of anchor boxes for the decoder and iteratively refines them in decoder. Hence, there are bboxs in the output, even through the task is semantic segmentation. Is that right?
- I'm curious about the query selection and query denoising schemes of masks. when the category, bbox and bin mask converted from a semantic map are obtained. The cls and bbox is noised as DINO. How to noise the mask, to ensure the behavioral consistency of the denoising and matching parts? And how is the mask inputted into the decoder? What about the matching part?
My questions seem long, thanks for the authors.
- No. We use DINO, which is totally different from Mask2Former.
- No. We use the original masks in semantic segmentation.
- We agree that is one difference.
- In our framework, we do not think there is any particular design for semantic segmentation. The only difference is semantic is category-level prediction.
OK, thanks for your reply! I have understood.