segment-anything GPU memory for fine-tuning mask decoder only

Hi, Thanks for the excellent work.

I will be fine tuning SAM mask-decoder only (keeping image and prompt encoder frozen) on few custom medical images. I have only 8GB of GPU memory available.

Do you think that this capacity is enough for fine-tuning mask decoder only. As in paper, it is mentioned that mask-decoder is very light weight. If not, is there any special setting that I can do to fix the tuning process within available memory?

Thanks!

Jan 11 '24 09:01 Raspberry-beans

It is doable with vit_b. Already tried it with gtx-1070. However, as it is too slow, I suggest using a bigger one.

Jan 11 '24 09:01 Dandelion404

Thanks for your quick response.

Would it be possible if I precompute the image embeddings of few shot images and saves them. And during mask-decoder tuning, I grab these pre-computed image-embeddings.

Just thinking this way as image encoder takes most of the memory space.

Jan 11 '24 09:01 Raspberry-beans

Sounds not bad. I have not tried it. What I have done is freeze the encoder part and only train the decoder part. By the way, I have read a Lora-based fine-tuning for SAM. It might be friendly for a small memory GPU. https://auto.gluon.ai/stable/tutorials/multimodal/image_segmentation/beginner_semantic_seg.html

Jan 11 '24 10:01 Dandelion404