segment-anything Change input image size

Hi,

Thanks for the release! I am trying to use only the image_encoder model to extract features from small images (64x64), which is easy to do with other ViT models such as DINO. However, it seems that the pretrained image_encoder model only accepts fixed size images (1024x1024). I can always resize the image but it is highly computationally/memory inefficient. If I don't apply image reshaping to 1024, I get an error because it seems that the input image size has been hard coded into the model during training (self.pos_embed size for example). I must have missed something because it seems highly non-generic for a foundation model.

Any solution for this problem?

Apr 07 '23 14:04 jorisguerin

The issue you're encountering is due to the fact that the pretrained SAM model expects input images of size 1024x1024. While you could resize the images, as you mentioned, it may be computationally and memory inefficient.

One solution to this problem is to fine-tune the SAM model on your specific dataset with smaller image sizes (e.g., 64x64) before using it for feature extraction. By fine-tuning the model on your smaller images, the model will be better suited to handling the reduced input size.

To do this, you'll need to adjust the model architecture to accept the smaller image sizes, specifically by changing the positional embeddings in the model. You might need to modify the code to allow for dynamic positional embeddings that can adapt to different input sizes. Keep in mind that making these changes may require a good understanding of the underlying model architecture and some experimentation.

Once you've fine-tuned the model on your smaller images, you can use the image_encoder to extract features more efficiently.

Apr 07 '23 17:04 Aryan-Mishra24

https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size

Oct 13 '23 21:10 ByungKwanLee

https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size

Thanks for your great work. I'm wondering if this only supports square-like image input size (like (128x128, 256x256))? Can I use any input size like (128,256)?

Jan 05 '24 04:01 ZaiyiHu

https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size

Thanks for your great work. I'm wondering if this only supports square-like image input size (like (128x128, 256x256))? Can I use any input size like (128,256)?

For initial intention, I did not consider non-square image, but I think it would provide a good flexibility for images with any shapes. Within a month ( due to paper deadline :< ), I am going to check and update my code.

Jan 05 '24 07:01 ByungKwanLee

https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size

Thanks for your great work. I'm wondering if this only supports square-like image input size (like (128x128, 256x256))? Can I use any input size like (128,256)?

For initial intention, I did not consider non-square image, but I think it would provide a good flexibility for images with any shapes. Within a month ( due to paper deadline :< ), I am going to check and update my code.

Alright. Thanks again!

Jan 05 '24 10:01 ZaiyiHu

https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size

Thanks for your great work. I'm wondering if this only supports square-like image input size (like (128x128, 256x256))? Can I use any input size like (128,256)?

For initial intention, I did not consider non-square image, but I think it would provide a good flexibility for images with any shapes. Within a month ( due to paper deadline :< ), I am going to check and update my code.

Can you give me a hint on how to do this? I may try this by myself.

Jan 05 '24 10:01 ZaiyiHu

Hi @ZaiyiHu did you implement non-square input images by any chance?

May 28 '24 02:05 mariiak2021

Sorry, but I haven't done this yet. :(

May 28 '24 02:05 ZaiyiHu

segment-anything segment-anything copied to clipboard

Change input image size

segment-anything
segment-anything copied to clipboard