segment-anything
segment-anything copied to clipboard
Change input image size
Hi,
Thanks for the release! I am trying to use only the image_encoder model to extract features from small images (64x64), which is easy to do with other ViT models such as DINO. However, it seems that the pretrained image_encoder model only accepts fixed size images (1024x1024). I can always resize the image but it is highly computationally/memory inefficient. If I don't apply image reshaping to 1024, I get an error because it seems that the input image size has been hard coded into the model during training (self.pos_embed size for example). I must have missed something because it seems highly non-generic for a foundation model.
Any solution for this problem?
The issue you're encountering is due to the fact that the pretrained SAM model expects input images of size 1024x1024. While you could resize the images, as you mentioned, it may be computationally and memory inefficient.
One solution to this problem is to fine-tune the SAM model on your specific dataset with smaller image sizes (e.g., 64x64) before using it for feature extraction. By fine-tuning the model on your smaller images, the model will be better suited to handling the reduced input size.
To do this, you'll need to adjust the model architecture to accept the smaller image sizes, specifically by changing the positional embeddings in the model. You might need to modify the code to allow for dynamic positional embeddings that can adapt to different input sizes. Keep in mind that making these changes may require a good understanding of the underlying model architecture and some experimentation.
Once you've fine-tuned the model on your smaller images, you can use the image_encoder to extract features more efficiently.
https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size
https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size
Thanks for your great work. I'm wondering if this only supports square-like image input size (like (128x128, 256x256))? Can I use any input size like (128,256)?
https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size
Thanks for your great work. I'm wondering if this only supports square-like image input size (like (128x128, 256x256))? Can I use any input size like (128,256)?
For initial intention, I did not consider non-square image, but I think it would provide a good flexibility for images with any shapes. Within a month ( due to paper deadline :< ), I am going to check and update my code.
https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size
Thanks for your great work. I'm wondering if this only supports square-like image input size (like (128x128, 256x256))? Can I use any input size like (128,256)?
For initial intention, I did not consider non-square image, but I think it would provide a good flexibility for images with any shapes. Within a month ( due to paper deadline :< ), I am going to check and update my code.
Alright. Thanks again!
https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size
Thanks for your great work. I'm wondering if this only supports square-like image input size (like (128x128, 256x256))? Can I use any input size like (128,256)?
For initial intention, I did not consider non-square image, but I think it would provide a good flexibility for images with any shapes. Within a month ( due to paper deadline :< ), I am going to check and update my code.
Can you give me a hint on how to do this? I may try this by myself.
Hi @ZaiyiHu did you implement non-square input images by any chance?
Sorry, but I haven't done this yet. :(