EVF-SAM finetuning problem with evfsam2

Nice work of extending sams ability to text guidied! We have used your evfsam1 as our baseline to a new area and showd a signifcant performance. However, when we finetuned your evfsam2, it was easy to overfiiting(didnt see in evfsam1).
Did you met the same prob when your fintune sam2 ? or some hyperpramters different from sam1? Hope to recieve your suggestion!

Sep 23 '24 14:09 vvvvvjdy

The main differences between sam1 and sam2 lie in:

pre-process: sam2 uses resize(1024), while sam1 uses resizelongest(1024) + padding.
sam2 uses hierachical image encoder and sam1 uses ViT
sam2 applies skip-connection to mask decoder.

Might these differences affect your training?

Sep 23 '24 16:09 CoderZhangYx

@vvvvvjdy can you share your fine-tuning script? many thanks.

Nov 30 '24 00:11 yi-ming-qian

Sorry for the late reply. During my experiment，I found that the data augmentation caused the overfitting probelm. Once I used the same augmentation as used in sam2 pretrain (only random hflip for image) ,the problem was solved. I assume that stronger data augmentation for sam2 than pre-training may cause the model to learn some unnatural shortcut features. The fine-tuneing scripe is simple，just like many sam1 fine-tuneing works. Only difference is the backbone. (I just train my model on images) Regards.

yi-ming-qian @.***> 于 2024年11月30日周六上午8:09写道：

@vvvvvjdy https://github.com/vvvvvjdy can you share your fine-tuning script? many thanks.

— Reply to this email directly, view it on GitHub https://github.com/hustvl/EVF-SAM/issues/26#issuecomment-2508745576, or unsubscribe https://github.com/notifications/unsubscribe-auth/BACXUVSRN6EEIJO7MEBG35L2DD6ZZAVCNFSM6AAAAABOWGI2AWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBYG42DKNJXGY . You are receiving this because you were mentioned.Message ID: @.***>

Dec 04 '24 17:12 vvvvvjdy

The augmentations influence the model performance in another way. In referring segmentation tasks, text prompts contain geometric words like "on the left". Once flipping or cropping or some other augmentations are applied, the prompts would fail. So only non-geometric augmentations are recommended.

Dec 05 '24 02:12 CoderZhangYx

@CoderZhangYx Quite agree with this statement. But even without such prompts in my finetuning data, the stronger data augmentation than the pretraining may cause this problem (some works have demonstrate the small-size model slike resnet-18 are especially sensitive to aug)，I m shocked that such a large foundation model sam2 has such characteristics.

Dec 05 '24 16:12 vvvvvjdy

That's amazing. What aug did you use? Could it be the reason that the aug didn't applied to the source fed to multi_model_extractor? Curious about this bug, honestly.

Dec 06 '24 07:12 CoderZhangYx

@CoderZhangYx I originally use large scale jittering(strong) for the input image(both beit and sam1 or 2) and gtmask, and found it works well on evfsam1 but not on evfsam2. Did you use the same aug for evfsam1 and 2, and which aug did you use ? ( it not mentioned in the paper)?

Dec 06 '24 16:12 vvvvvjdy

In fact we use no aug when training our model. It is so strange that scale jittering affect performance of sam2. Inform me if you find out any other reasons, thanks!

Dec 09 '24 08:12 CoderZhangYx

Nice work of extending sams ability to text guidied! We have used your evfsam1 as our baseline to a new area and showd a signifcant performance. However, when we finetuned your evfsam2, it was easy to overfiiting(didnt see in evfsam1). Did you met the same prob when your fintune sam2 ? or some hyperpramters different from sam1? Hope to recieve your suggestion!

"Hello, I am currently working on creating the training code for EVFSAM2, but I have encountered some issues. I would like to ask if you would be willing to share a fine-tuning script that you have created. If possible, I would greatly appreciate it, as it would help me solve a significant problem."

Feb 24 '25 01:02 LORDPQK

EVF-SAM EVF-SAM copied to clipboard

finetuning problem with evfsam2

EVF-SAM
EVF-SAM copied to clipboard