EVF-SAM
EVF-SAM copied to clipboard
finetuning problem with evfsam2
Nice work of extending sams ability to text guidied!
We have used your evfsam1 as our baseline to a new area and showd a signifcant performance. However, when we finetuned your evfsam2, it was easy to overfiiting(didnt see in evfsam1).
Did you met the same prob when your fintune sam2 ? or some hyperpramters different from sam1?
Hope to recieve your suggestion!
The main differences between sam1 and sam2 lie in:
- pre-process: sam2 uses resize(1024), while sam1 uses resizelongest(1024) + padding.
- sam2 uses hierachical image encoder and sam1 uses ViT
- sam2 applies skip-connection to mask decoder.
Might these differences affect your training?
@vvvvvjdy can you share your fine-tuning script? many thanks.
Sorry for the late reply. During my experiment,I found that the data augmentation caused the overfitting probelm. Once I used the same augmentation as used in sam2 pretrain (only random hflip for image) ,the problem was solved. I assume that stronger data augmentation for sam2 than pre-training may cause the model to learn some unnatural shortcut features. The fine-tuneing scripe is simple,just like many sam1 fine-tuneing works. Only difference is the backbone. (I just train my model on images) Regards.
yi-ming-qian @.***> 于 2024年11月30日周六 上午8:09写道:
@vvvvvjdy https://github.com/vvvvvjdy can you share your fine-tuning script? many thanks.
— Reply to this email directly, view it on GitHub https://github.com/hustvl/EVF-SAM/issues/26#issuecomment-2508745576, or unsubscribe https://github.com/notifications/unsubscribe-auth/BACXUVSRN6EEIJO7MEBG35L2DD6ZZAVCNFSM6AAAAABOWGI2AWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBYG42DKNJXGY . You are receiving this because you were mentioned.Message ID: @.***>
The augmentations influence the model performance in another way. In referring segmentation tasks, text prompts contain geometric words like "on the left". Once flipping or cropping or some other augmentations are applied, the prompts would fail. So only non-geometric augmentations are recommended.
@CoderZhangYx Quite agree with this statement. But even without such prompts in my finetuning data, the stronger data augmentation than the pretraining may cause this problem (some works have demonstrate the small-size model slike resnet-18 are especially sensitive to aug),I m shocked that such a large foundation model sam2 has such characteristics.
That's amazing. What aug did you use? Could it be the reason that the aug didn't applied to the source fed to multi_model_extractor? Curious about this bug, honestly.
@CoderZhangYx I originally use large scale jittering(strong) for the input image(both beit and sam1 or 2) and gtmask, and found it works well on evfsam1 but not on evfsam2. Did you use the same aug for evfsam1 and 2, and which aug did you use ? ( it not mentioned in the paper)?
In fact we use no aug when training our model. It is so strange that scale jittering affect performance of sam2. Inform me if you find out any other reasons, thanks!
Nice work of extending sams ability to text guidied! We have used your evfsam1 as our baseline to a new area and showd a signifcant performance. However, when we finetuned your evfsam2, it was easy to overfiiting(didnt see in evfsam1). Did you met the same prob when your fintune sam2 ? or some hyperpramters different from sam1? Hope to recieve your suggestion!
"Hello, I am currently working on creating the training code for EVFSAM2, but I have encountered some issues. I would like to ask if you would be willing to share a fine-tuning script that you have created. If possible, I would greatly appreciate it, as it would help me solve a significant problem."