GPU memory for fine-tuning mask decoder only
Hi, Thanks for the excellent work.
I will be fine tuning SAM mask-decoder only (keeping image and prompt encoder frozen) on few custom medical images. I have only 8GB of GPU memory available.
Do you think that this capacity is enough for fine-tuning mask decoder only. As in paper, it is mentioned that mask-decoder is very light weight. If not, is there any special setting that I can do to fix the tuning process within available memory?
Thanks!
It is doable with vit_b. Already tried it with gtx-1070. However, as it is too slow, I suggest using a bigger one.
Thanks for your quick response.
Would it be possible if I precompute the image embeddings of few shot images and saves them. And during mask-decoder tuning, I grab these pre-computed image-embeddings.
Just thinking this way as image encoder takes most of the memory space.
Sounds not bad. I have not tried it. What I have done is freeze the encoder part and only train the decoder part. By the way, I have read a Lora-based fine-tuning for SAM. It might be friendly for a small memory GPU. https://auto.gluon.ai/stable/tutorials/multimodal/image_segmentation/beginner_semantic_seg.html