The implementation of Hybrid Matching
Thanks for the great work.
I have read the paper and I think there is not enough detail about the implementation of hybrid matching. Or if there is, I could not understand it. Is it possible to elaborate it more, will it be analyzed in detail in the second version of the paper?
Thanks in advance Esat
Hi, the hybrid matching is to match with classification loss, mask loss, and box loss (add the three losses together). Therefore, it is simple and we do not elaborate on it. We also provide the experimental analysis of different matching strategies, you can refer to the ablation study for more details. Thank you.
Are mixing weights for box and mask matching matrices kept equal? It's not very clear from Table 12
Hey, we present these details in the appendix.
I did check the appendix of MaskDINO (before posting the comment and another time now), it has no mention of weights used for composing the final matching matrix. In fact, I could not find any discussion of hybrid matching in the appendix.
The only discussion of hybrid matching in the paper is 4.3 Ablation studies which just says: Matching. In Table 12, we show that only using boxes or masks to perform bipartite matching is not optimal in Mask DINO. A unified matching objective makes the optimization more consistent.. And Table 12 does not mention actual weights used for composing the final matching matrix, hence my question. I assume weights=1 were used, but it would be nice to have an explicit confirmation.
For example, here is this matching matrix construction from DeformableDETR codebase: https://github.com/fundamentalvision/Deformable-DETR/blob/main/models/matcher.py#L91. I'm wondering what are these cost_* weights in your case.
Sorry for the unclear description. In the appendix, we provide the loss function weights of different losses. The matching cost is the same as the loss weights.
Oh, that's interesting, because in DeformableDETR they are not the same at all. E.g. bbox_loss_coef==5, while cost_bbox==1