segment-anything Runtime error on tensor dimensionnality in automatic mask generator

Hello, I tried to replicate some experiment (from RapidBenthos on github) using segment-anything and I get a runtime error "RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 2 is not equal to len(dims) = 3"

I have no clue about the root cause of that error and I would like some explanations about what is going on in that stack trace to know in which direction to look.

the code is using segment-anything-py==1.0 through segment-geospatial==0.9.0

File [~/Git/RapidBenthos/venv/lib/python3.12/site-packages/segment_anything/automatic_mask_generator.py:163](http://127.0.0.1:8888/lab/tree/Git/RapidBenthos/Git/RapidBenthos/venv/lib/python3.12/site-packages/segment_anything/automatic_mask_generator.py#line=162), in SamAutomaticMaskGenerator.generate(self, image)
    138 """
    139 Generates masks for the given image.
    140 
   (...)
    159          the mask, given in XYWH format.
    160 """
    162 # Generate masks
--> 163 mask_data = self._generate_masks(image)
    165 # Filter small disconnected regions and holes in masks
    166 if self.min_mask_region_area > 0:

File [~/Git/RapidBenthos/venv/lib/python3.12/site-packages/segment_anything/automatic_mask_generator.py:206](http://127.0.0.1:8888/lab/tree/Git/RapidBenthos/Git/RapidBenthos/venv/lib/python3.12/site-packages/segment_anything/automatic_mask_generator.py#line=205), in SamAutomaticMaskGenerator._generate_masks(self, image)
    204 data = MaskData()
    205 for crop_box, layer_idx in zip(crop_boxes, layer_idxs):
--> 206     crop_data = self._process_crop(image, crop_box, layer_idx, orig_size)
    207     data.cat(crop_data)
    209 # Remove duplicate masks between crops

File [~/Git/RapidBenthos/venv/lib/python3.12/site-packages/segment_anything/automatic_mask_generator.py:236](http://127.0.0.1:8888/lab/tree/Git/RapidBenthos/Git/RapidBenthos/venv/lib/python3.12/site-packages/segment_anything/automatic_mask_generator.py#line=235), in SamAutomaticMaskGenerator._process_crop(self, image, crop_box, crop_layer_idx, orig_size)
    234 cropped_im = image[y0:y1, x0:x1, :]
    235 cropped_im_size = cropped_im.shape[:2]
--> 236 self.predictor.set_image(cropped_im)
    238 # Get points for this crop
    239 points_scale = np.array(cropped_im_size)[None, ::-1]

File [~/Git/RapidBenthos/venv/lib/python3.12/site-packages/segment_anything/predictor.py:58](http://127.0.0.1:8888/lab/tree/Git/RapidBenthos/Git/RapidBenthos/venv/lib/python3.12/site-packages/segment_anything/predictor.py#line=57), in SamPredictor.set_image(self, image, image_format)
     56 input_image = self.transform.apply_image(image)
     57 input_image_torch = torch.as_tensor(input_image, device=self.device)
---> 58 input_image_torch = input_image_torch.permute(2, 0, 1).contiguous()[None, :, :, :]
     60 self.set_torch_image(input_image_torch, image.shape[:2])

RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 2 is not equal to len(dims) = 3

Mar 19 '25 15:03 vchalmel

It seems to be saying the input image only has 2 dimensions (e.g. height & width) while the code is trying to re-arrange something with 3 dimensions (e.g. height, width & channels). So I'd guess the problem is the input is a grayscale image, while the model expects RGB. If you're loading the image with opencv, you can make it rgb before giving it to the model with something like:

rgb_img = cv2.cvtColor(gray_img, cv2.COLOR_GRAY2RGB)

Mar 20 '25 13:03 heyoeyo

Thanks for the reply, I will try to find how to add something like this

Mar 21 '25 08:03 vchalmel