Painter icon indicating copy to clipboard operation
Painter copied to clipboard

Question about the learnable image tensor of in-context tuning in SegGPT

Open YangHan-Morningstar opened this issue 1 year ago • 1 comments

Hi there, thanks for your amazing work. After reading your paper of SegGPT. I'm little confused about the in-context tuning. In the paper, during the training stage, SegGPT treat a learnable image tensor as learnable prompt. But in the normal training stage, the input is a pair of in-context images with each mask, such as image1-mask1 and image2-mask2. So the learnable image tensor is a random image-mask? With image3-mask3 from the datasets, the whole input is image-mask(prompt) and image3-mask3? Due to the mask of random image-mask is random, so there is no label for loss calculation and gradient backward, how does it be trained? Please tell me more and help me solve this. Thanks!

YangHan-Morningstar avatar Apr 13 '23 01:04 YangHan-Morningstar

This is my implementation based on my understanding.

I think that once you used the learnable prompt, you simply replace the image1-mask1 with the image tensor that you optimize

SteveImmanuel avatar Apr 02 '24 04:04 SteveImmanuel