sam-hq icon indicating copy to clipboard operation
sam-hq copied to clipboard

Question about HQ-Output Token and weight updates in the frozen Mask Decoder

Open Linn0910 opened this issue 1 year ago • 1 comments

Hello,

Thank you for your great work on the HQ-SAM model! I have a question regarding the role of the HQ-Output Token in the model when interacting with the frozen Mask Decoder. From the architecture diagram, I understand that the HQ-Output Token is integrated into the frozen Mask Decoder to improve segmentation accuracy. However, I am curious about how the HQ-Output Token's weights are updated during training, given that the Mask Decoder itself is frozen and its weights are not updated. Here are my specific questions:

1.Since the Mask Decoder is frozen, how are the HQ-Output Token's weights updated during training? 2.Does the HQ-Output Token rely solely on the Global-local Fusion and MLP layers for weight updates, or does it interact with the Mask Decoder in a different way for updates? 3.How does the error correction mechanism contribute to the HQ-Output Token’s learning in this setup? I would greatly appreciate it if you could clarify these points. Thank you again for your time and for sharing your amazing research!

Best regards, Lin

Linn0910 avatar Sep 30 '24 07:09 Linn0910

Hello,

Thank you for your great work on the HQ-SAM model! I have a question regarding the role of the HQ-Output Token in the model when interacting with the frozen Mask Decoder. From the architecture diagram, I understand that the HQ-Output Token is integrated into the frozen Mask Decoder to improve segmentation accuracy. However, I am curious about how the HQ-Output Token's weights are updated during training, given that the Mask Decoder itself is frozen and its weights are not updated. Here are my specific questions:

1.Since the Mask Decoder is frozen, how are the HQ-Output Token's weights updated during training? 2.Does the HQ-Output Token rely solely on the Global-local Fusion and MLP layers for weight updates, or does it interact with the Mask Decoder in a different way for updates? 3.How does the error correction mechanism contribute to the HQ-Output Token’s learning in this setup? I would greatly appreciate it if you could clarify these points. Thank you again for your time and for sharing your amazing research!

Best regards, Lin

Hi, bro, do you understand it?

Linzy0227 avatar Apr 29 '25 03:04 Linzy0227