how to find ground truth of ambiguous masks

Open pxliang opened this issue 11 months ago • 1 comments

As described in the paper, each prompt will outputs three different masks to solve the ambiguous issue. My question is that, during the training phase, given one prompt. how to collect three masks as ground truth mask?

Jan 28 '25 16:01 pxliang

In the appendix of the paper, under the section Making the model ambiguity-aware (pg. 17) they mention:

"During training, we compute the loss between the ground truth and each of the predicted masks, but only backpropagate from the lowest loss"

So it seems they only have 1 ground truth per prompt, and only 1 mask prediction gets 'trained' per ground truth.

Jan 28 '25 22:01 heyoeyo