RobustSAM Inconsistency between input (224x224) and output (352x352) image dimensions

Inconsistency between input (224x224) and output (352x352) image dimensions

Open Eng-ZeyadTarek opened this issue 11 months ago • 0 comments

The model is currently forcing output images to 352x352 dimensions even when the input images are explicitly resized to 224x224. This creates a size mismatch error and prevents proper model usage with architectures expecting 224x224 images.

Current Behavior

When processing an image that has been resized to 224x224:

Input image is correctly resized to 224x224
Model processes the image
Output is forced to 352x352
Raises error: ValueError: Input image size (352*352) doesn't match model (224*224)

Expected Behavior

The model should maintain the input image dimensions (224x224) throughout processing, or provide a configuration option to specify desired output dimensions.

Example.

print("Image size:", image.size)  #print (224, 224)
prompts = ["ball"]
import torch

inputs = processor(text=prompts, images=[image] * len(prompts), padding="max_length", return_tensors="pt")

# predict
with torch.no_grad():
    outputs = model(**inputs)
preds = outputs.logits.unsqueeze(1)

This will generate an error.

ValueError                                Traceback (most recent call last)
[<ipython-input-26-5c026bcdb696>](https://localhost:8080/#) in <cell line: 7>()
      6 # predict
      7 with torch.no_grad():
----> 8     outputs = model(**inputs)
      9 preds = outputs.logits.unsqueeze(1)

8 frames
[/usr/local/lib/python3.10/dist-packages/transformers/models/clipseg/modeling_clipseg.py](https://localhost:8080/#) in forward(self, pixel_values, interpolate_pos_encoding)
    209         batch_size, _, height, width = pixel_values.shape
    210         if not interpolate_pos_encoding and (height != self.image_size or width != self.image_size):
--> 211             raise ValueError(
    212                 f"Input image size ({height}*{width}) doesn't match model" f" ({self.image_size}*{self.image_size})."
    213             )

ValueError: Input image size (352*352) doesn't match model (224*224).

Nov 25 '24 01:11 Eng-ZeyadTarek

RobustSAM RobustSAM copied to clipboard

Inconsistency between input (224x224) and output (352x352) image dimensions

Current Behavior

Expected Behavior

RobustSAM
RobustSAM copied to clipboard