RobustSAM
RobustSAM copied to clipboard
Inconsistency between input (224x224) and output (352x352) image dimensions
The model is currently forcing output images to 352x352 dimensions even when the input images are explicitly resized to 224x224. This creates a size mismatch error and prevents proper model usage with architectures expecting 224x224 images.
Current Behavior
When processing an image that has been resized to 224x224:
- Input image is correctly resized to 224x224
- Model processes the image
- Output is forced to 352x352
- Raises error:
ValueError: Input image size (352*352) doesn't match model (224*224)
Expected Behavior
The model should maintain the input image dimensions (224x224) throughout processing, or provide a configuration option to specify desired output dimensions.
Example.
print("Image size:", image.size) #print (224, 224)
prompts = ["ball"]
import torch
inputs = processor(text=prompts, images=[image] * len(prompts), padding="max_length", return_tensors="pt")
# predict
with torch.no_grad():
outputs = model(**inputs)
preds = outputs.logits.unsqueeze(1)
This will generate an error.
ValueError Traceback (most recent call last)
[<ipython-input-26-5c026bcdb696>](https://localhost:8080/#) in <cell line: 7>()
6 # predict
7 with torch.no_grad():
----> 8 outputs = model(**inputs)
9 preds = outputs.logits.unsqueeze(1)
8 frames
[/usr/local/lib/python3.10/dist-packages/transformers/models/clipseg/modeling_clipseg.py](https://localhost:8080/#) in forward(self, pixel_values, interpolate_pos_encoding)
209 batch_size, _, height, width = pixel_values.shape
210 if not interpolate_pos_encoding and (height != self.image_size or width != self.image_size):
--> 211 raise ValueError(
212 f"Input image size ({height}*{width}) doesn't match model" f" ({self.image_size}*{self.image_size})."
213 )
ValueError: Input image size (352*352) doesn't match model (224*224).