segment-anything icon indicating copy to clipboard operation
segment-anything copied to clipboard

Multimask output question

Open 25benjaminli opened this issue 1 year ago • 2 comments

Why is num_mask_tokens = num_multimask_outputs + 1? And why is it that when you use multimask output, it slices from (1, None)?

25benjaminli avatar Feb 04 '24 17:02 25benjaminli

Have you solved it? I also want to know the answer to that question

MyFirst905 avatar Mar 19 '24 02:03 MyFirst905

@MyFirst905 I have not "solved it" but have a rough idea as to why this is the case. According to the paper:

"With one output, the model will average multiple valid masks if given an ambiguous prompt. To address this, we modify the model to predict multiple output masks for a single prompt (see Fig. 3). We found 3 mask outputs is sufficient to address most common cases (nested masks are often at most three deep: whole, part, and subpart). During training, we backprop only the minimum loss over masks. To rank masks, the model predicts a confidence score (i.e., estimated IoU) for each mask"

If I am interpreting this correctly, the extra multimask outputs are supposed to describe different levels of detail.

25benjaminli avatar Mar 20 '24 17:03 25benjaminli