segment-anything
segment-anything copied to clipboard
how to find ground truth of ambiguous masks
As described in the paper, each prompt will outputs three different masks to solve the ambiguous issue. My question is that, during the training phase, given one prompt. how to collect three masks as ground truth mask?
In the appendix of the paper, under the section Making the model ambiguity-aware (pg. 17) they mention:
"During training, we compute the loss between the ground truth and each of the predicted masks, but only backpropagate from the lowest loss"
So it seems they only have 1 ground truth per prompt, and only 1 mask prediction gets 'trained' per ground truth.