segment-anything icon indicating copy to clipboard operation
segment-anything copied to clipboard

Multiple Boxes as prompt

Open kabbas570 opened this issue 2 years ago • 19 comments

Hello, thanks for the nice work; well done. Can the SAM take multiple Bounding boxes as prompts for segmentation? For example, if I draw boxes around two objects, say a building and a dog, image The SAM only segments the one as

masks, scores, logits = mask_predictor.predict( box=box, multimask_output=True

Here, it expects the size of the box as [1,4]. If it becomes [2,4], it raises this error.

157 if boxes is not None: 158 box_embeddings = self._embed_boxes(boxes) --> 159 sparse_embeddings = torch.cat([sparse_embeddings, box_embeddings], dim=1) 160 161 if masks is not None: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

Not sure how to change the size of sparse_embeddings as well to [2,4]!!!

Thank you Cheers Abbas

kabbas570 avatar May 21 '23 15:05 kabbas570

You can use predict_torch to give multiple bounding boxes as input prompts:

input_boxes = torch.tensor([box_1, box_2], device=mask_predictor.device)  
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])  
masks, iou_predictions, low_res_masks = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)

0vl0 avatar Jun 02 '23 08:06 0vl0

You can use predict_torch to give multiple bounding boxes as input prompts:

input_boxes = torch.tensor([box_1, box_2], device=mask_predictor.device)  
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])  
masks, iou_predictions, low_res_masks = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)

Hello, I used the .predict function and got the alpha masks for my image. Is there any option to generate the segmentation Image as the output, for example the Dog cropped out in a separate image.

I tried the bitwise_And operation in opencv but the output was not that clean. So, is there any setting in SAM to do the same

Dipankar1997161 avatar Jun 03 '23 17:06 Dipankar1997161

Is there any option to generate the segmentation Image as the output, for example the Dog cropped out in a separate image. This option is not supported by SAM. You have to post-process the output mask.

If your method didn't work, you can extract the dog using the bounding box of the mask:

image = cv2.imread(path_image)
masks, _, _ = predictor.predict(point_coords=points, point_labels=labels, multimask_output=False)  
Y, X = masks[0].nonzero()  
left, right, top, bottom = min(X), max(X), min(Y), max(Y)  
dog_image = image[top:bottom, left:right]

dog|dog_cropped

0vl0 avatar Jun 05 '23 13:06 0vl0

Hi @kabbas570 @0vl0
I am struggling to understand the format of the bounding box. Is it yolo format (x-center, y-center, w, h) or coco format (xmin, ymin, w, h). Since you are already able to extract masks, it would be really helpful if you could clear this for me. Thanks in advance

kulkarnikeerti avatar Jun 12 '23 14:06 kulkarnikeerti

@kulkarnikeerti here is the format,

default_box is going to be used if you will not draw any box on image above

default_box = {'x': 68, 'y': 247, 'width': 555, 'height': 678, 'label': ''}

kabbas570 avatar Jun 12 '23 16:06 kabbas570

You can use predict_torch to give multiple bounding boxes as input prompts:

input_boxes = torch.tensor([box_1, box_2], device=mask_predictor.device)  
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])  
masks, iou_predictions, low_res_masks = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)

I was able to solve the issue by iterating through he boxes, here is a sample code,

import numpy as np import cv2 import numpy as np import matplotlib.pyplot as plt import supervision as sv default_box = {'x': 68, 'y': 247, 'width': 555, 'height': 678, 'label': ''}

image_bgr = cv2.imread(IMAGE_PATH) image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB) combine = np.zeros(image_rgb.shape) print(len(widget.bboxes)) for i in range(len(widget.bboxes)): box = widget.bboxes[i] if widget.bboxes else default_box box = np.array([ box['x'], box['y'], box['x'] + box['width'], box['y'] + box['height'] ])

mask_predictor.set_image(image_rgb)

masks, scores, logits = mask_predictor.predict( box=box, multimask_output=False )

masks=np.transpose(masks,[2,1,0]) masks=np.transpose(masks,[1,0,2]) combine[np.where(masks==1)]=1

plt.figure() plt.imshow(combine)

kabbas570 avatar Jun 12 '23 16:06 kabbas570

@kulkarnikeerti here is the format,

default_box is going to be used if you will not draw any box on image above

default_box = {'x': 68, 'y': 247, 'width': 555, 'height': 678, 'label': ''}

@kabbas570 Thanks for the reply. But, I am still confused. My question was what these x and y were? xmin, ymin or x-center, y-center. Because I have my bounding box values defined, which I tried to use in demo notebook to get a segmentation based on bounding box. The problem here is, the bounding box doesn't exactly fit the object. I have bounding box in yolo format (x-center, y-center). That's why I wanted to understand what format the code uses. I tried converting from either way, but nothing works for now.

kulkarnikeerti avatar Jun 13 '23 07:06 kulkarnikeerti

Hi @kulkarnikeerti, the input bounding box format to predict is xyxy (left, top, right, bottom).

0vl0 avatar Jun 13 '23 07:06 0vl0

@0vl0 Thanks. Got it!

kulkarnikeerti avatar Jun 13 '23 13:06 kulkarnikeerti

Hi, everyone,

what's the difference between apply_boxes and apply_boxes_torch in file segment_anything/utils/transforms.py?

GewelsJI avatar Jun 22 '23 12:06 GewelsJI

Is there any way to use an indeterminate number of points for each bounding box, without the need for each bounding box to have exactly the same points? For example, create two bounding boxes, in which the first one has a single foreground point and the second one has a foreground point and a background point.

The fixed dimension of the torch tensioners does not allow it, could you give me a small example of how to do it?

Thanks in advanced!

image

image

emi-dm avatar Sep 26 '23 08:09 emi-dm

Hi all!

I'd like to use multiple boxes and multiple points as input to predict the masks. IHowever, I'm getting a shape error when I try that. The code I have been trying is :

input_box = torch.tensor(input_box)
        input_box = predictor.transform.apply_boxes_torch(input_box, image.shape[:2])
        if input_point is not None : 
            input_point = torch.as_tensor(input_point, dtype=torch.float)
            input_label = torch.as_tensor(input_label, dtype=torch.int)
            
            input_point = predictor.transform.apply_coords_torch(input_point,image.shape[:2])
            # input_label = predictor.transform.apply_coords_torch(input_label,image.shape[:2])
            print("labels_torch:",input_label.shape)
            input_point, input_label = input_point[None, :, :], input_label[None, :]
            print("coords_torch:",input_point.shape)
            print("labels_torch:",input_label.shape)
        
        masks, _, _ = predictor.predict_torch(
        point_coords=input_point,
        point_labels=input_label,
        boxes=input_box,
        multimask_output=False,
        )

The error message I get is :

image

I am using the predict_torch method, so I had a look at the predictor.py file, which requires that point_labels is a BxN torch tensor and point_coordinates is a BxNx2``. Here, I am not sure what B is, but N I assume is the number of points clicked. As I am using predict_torchmethod directly without first using thepredictor.predict` method, I also ensured to convert input_labels and input_points to tensors and the shapes that I get are: torch.Size([1, 5]) and ([1, 5, 2]) respectively .

Can someone help me out? Thanks!

shrutichakraborty avatar Nov 16 '23 09:11 shrutichakraborty

I got the Same issue but only if i use 2 or more boxes:

works:

points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
labels = np.array([1,1])
boxes = np.array([[1300, 1550, 1450, 1750]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) # Dimensions of boxes: torch.Size([1, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([1, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

works:

transformed_coords = None
input_label = None
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

not working:


points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) #Dimensions of boxes: torch.Size([2, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([2, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

ritchi1408 avatar Nov 21 '23 14:11 ritchi1408

I got the Same issue but only if i use 2 or more boxes:

works:

points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
labels = np.array([1,1])
boxes = np.array([[1300, 1550, 1450, 1750]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) # Dimensions of boxes: torch.Size([1, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([1, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

works:

transformed_coords = None
input_label = None
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

not working:


points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) #Dimensions of boxes: torch.Size([2, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([2, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

I got the Same issue but only if i use 2 or more boxes:

works:

points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
labels = np.array([1,1])
boxes = np.array([[1300, 1550, 1450, 1750]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) # Dimensions of boxes: torch.Size([1, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([1, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

works:

transformed_coords = None
input_label = None
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

not working:


points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) #Dimensions of boxes: torch.Size([2, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([2, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

Hi! Look at the issue : https://github.com/facebookresearch/segment-anything/issues/620 :)

shrutichakraborty avatar Nov 21 '23 14:11 shrutichakraborty

You can use predict_torch to give multiple bounding boxes as input prompts:

input_boxes = torch.tensor([box_1, box_2], device=mask_predictor.device)  
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])  
masks, iou_predictions, low_res_masks = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)

hello, I use this code, but it failed.

Sizes of tensors must match except in dimension 0. Got 1 and 2 (The offending index is 0)

from predictor import SamPredictor
# from utils.utils import show_mask, show_box, show_points
sam = sam_model_registry['vit_b'](checkpoint='/home/nhw/omni/checkpoints/sam_vit_b_01ec64.pth').cuda()
mask_predictor = SamPredictor(sam)

# image upload
img = np.array(Image.open("figure/dog.jpg"))
mask_predictor.set_image(img)
input_boxes = torch.tensor([[200, 200, 600, 600],[200, 200, 600, 600]], device=mask_predictor.device)  # x1,y1,x2,y2
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, img.shape[:2])

masks, scores, logits = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)

nhw649 avatar Dec 22 '23 09:12 nhw649

So this is my Solution:

input_points = []

input_boxes = []
input_label = []

for groupedPoints in groupedPointsByBoxes.items():
    pointsByBoxForSegmentation = []
    labelsByBoxForSegmentation = []
    for point in groupedPoints[1]:
        pointsByBoxForSegmentation.append([point.Point.x, point.Point.y])
        labelsByBoxForSegmentation.append(int(point.MaskPoint))

    input_boxes.append(groupedPoints[0])
    input_points.append(pointsByBoxForSegmentation)
    input_label.append(labelsByBoxForSegmentation)

self.predictor.set_image(image)

input_points = np.array(input_points)
transformed_coords = torch.tensor(input_points, device=self.predictor.device)
transformed_coords = self.predictor.transform.apply_coords_torch(transformed_coords, image.shape[:2])

input_boxes = torch.tensor(input_boxes, device=self.predictor.device)
transformed_boxes = self.predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])

input_label = torch.tensor(np.array(input_label), device=self.predictor.device)

masks, scores, logits = self.predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    # point_coords=None,
    # point_labels=None,
    boxes=transformed_boxes,
    # mask_input=mask_input[None, :, :],
    multimask_output=False
)

groupedPointsByBoxes is a dictionary with [minX,minY,maxX,maxY] as key and points with the labels (0 or 1)

you can also leave out the points and labels it should also work with boxes only

i hope this one helps

ritchi1408 avatar Dec 22 '23 09:12 ritchi1408

So this is my Solution:

input_points = []

input_boxes = []
input_label = []

for groupedPoints in groupedPointsByBoxes.items():
    pointsByBoxForSegmentation = []
    labelsByBoxForSegmentation = []
    for point in groupedPoints[1]:
        pointsByBoxForSegmentation.append([point.Point.x, point.Point.y])
        labelsByBoxForSegmentation.append(int(point.MaskPoint))

    input_boxes.append(groupedPoints[0])
    input_points.append(pointsByBoxForSegmentation)
    input_label.append(labelsByBoxForSegmentation)

self.predictor.set_image(image)

input_points = np.array(input_points)
transformed_coords = torch.tensor(input_points, device=self.predictor.device)
transformed_coords = self.predictor.transform.apply_coords_torch(transformed_coords, image.shape[:2])

input_boxes = torch.tensor(input_boxes, device=self.predictor.device)
transformed_boxes = self.predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])

input_label = torch.tensor(np.array(input_label), device=self.predictor.device)

masks, scores, logits = self.predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    # point_coords=None,
    # point_labels=None,
    boxes=transformed_boxes,
    # mask_input=mask_input[None, :, :],
    multimask_output=False
)

groupedPointsByBoxes is a dictionary with [minX,minY,maxX,maxY] as key and points with the labels (0 or 1)

you can also leave out the points and labels it should also work with boxes only

i hope this one helps

ok, I will try. Thanks.

nhw649 avatar Dec 22 '23 15:12 nhw649

The Tutorial under path notebooks/predictor_example.ipynb. Batched prompt inputs.

315386775 avatar Jan 10 '24 07:01 315386775

Hello, I tried both iterating through each of my boxes and get a mask for each which works very well but is pretty slow, I now batch the input by sending in all those boxes. The problem is that i now get some masks which are merged with eachother which i did not get before, is there any way to get the same functionality as sending the prompts one by one?

Preburk avatar May 23 '24 11:05 Preburk