segment-anything Why do the results of the same image differ?

Hi, I try to find all the objects in the image automatically. I used below code.

import numpy as np
import torch
import matplotlib.pyplot as plt
import cv2
import glob
def show_anns(anns,save_path):
    if len(anns) == 0:
        print(save_path)
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)
    polygons = []
    color = []
    for ann in sorted_anns:
        m = ann['segmentation']
        img = np.ones((m.shape[0], m.shape[1], 3))
        color_mask = np.random.random((1, 3)).tolist()[0]
        for i in range(3):
            img[:,:,i] = color_mask[i]
        ax.imshow(np.dstack((img, m*0.35)))
    plt.savefig(save_path)
        

import sys
sys.path.append("..")
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor

sam_checkpoint = "../sam_vit_h_4b8939.pth"
model_type = "vit_h"

device = "cuda"

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)

mask_generator = SamAutomaticMaskGenerator(sam)

files = glob.glob(fr"./*.jpg")
idx = 0
for file in files:
    image = cv2.imread(file)
   
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    print(f"image.shape:{image.shape}")
    plt.clf()
    plt.subplot(1,2,1)
    plt.imshow(image)
    
    plt.subplot(1,2,2)
    plt.imshow(image)
    masks = mask_generator.generate(image)
    print(fr"masks:{len(masks)}")
    show_anns(masks,fr"{idx}.png")
    idx += 1

However, I got this result:

But the demo's effect is very good, you can see the below result 1681316389260

My biggest question is why the online effect is so good！！！ Have you used any other methods？

Apr 12 '23 16:04 mulinhu

I remarked that also !!!!!!.

I think that there is a pre-processing for the images which we did not perform.

Apr 12 '23 16:04 AnasCHARROUD

There's also a number of parameters when initializing the model; you're currently using all of the default values. I agree that it would be nice to know what parameters are used in the online demo @HannaMao

Apr 12 '23 20:04 JordanMakesMaps

Pay attention to the show_anns() function, the line 'color_mask = np.random.random((1, 3)).tolist()[0]' may result in the difference in the same input, but I also don't know how to handle this problem.

Apr 13 '23 03:04 LedKashmir

The situation you mentioned may arise. But I think the probability of the same numerical value is very low. @LedKashmir

Apr 13 '23 04:04 mulinhu

Hope to know what parameters are used in the online demo @HannaMao.

Apr 13 '23 05:04 Glisten-5481

Pay attention to the show_anns() function, the line 'color_mask = np.random.random((1, 3)).tolist()[0]' may result in the difference in the same input, but I also don't know how to handle this problem.

That's definitely a possibility, but you can see in the example image provided above, the parent bear's ears are segmented separately than in the online demo, so it is giving different results (different parameters than the API's default)

Apr 13 '23 16:04 Jordan-Pierce

there are a lot of parameters can be tuned in SamAutomaticMaskGenerator

mask_generator = SamAutomaticMaskGenerator(
    # model: Sam,
    # points_per_side: Optional[int] = 32,
    # points_per_batch: int = 64,
    # pred_iou_thresh: float = 0.88,
    # stability_score_thresh: float = 0.95,
    # stability_score_offset: float = 1.0,
    # box_nms_thresh: float = 0.7,
    # crop_n_layers: int = 0,
    # crop_nms_thresh: float = 0.7,
    # crop_overlap_ratio: float = 512 / 1500,
    # crop_n_points_downscale_factor: int = 1,
    # point_grids: Optional[List[np.ndarray]] = None,
    # min_mask_region_area: int = 0,
    # output_mode: str = "binary_mask",
    model=sam,

    points_per_side=32,
    points_per_batch=64,
    pred_iou_thresh=0.86,
    stability_score_thresh=0.92,
    box_nms_thresh=0.5,
    # crop_n_layers=1,
    # crop_n_points_downscale_factor=2,

    min_mask_region_area=500,
)

box_nms_thresh: can remove duplicate mask by their bounding box iou crop_n_layers=1, and crop_n_points_downscale_factor=2, can get you finer results because it use multi crops to extract features and decode masks min_mask_resion_area can remove "holes" and "islands" attached to every single mask

Apr 14 '23 09:04 huxycn

@huxycn have you seen an improvements doing any sort of image pre-processing? Obviously speed if resizing the image, but I've tried sharpening the image and that seems to help a little.

Apr 14 '23 14:04 Jordan-Pierce

same question! Is there any solution

Apr 16 '23 13:04 Jack-bo1220

Same question! The results can vary wildly. 5da1d642760dc16d489dacf86cc9276 976e7494716baf5bfe97a1ec9829b57

Apr 16 '23 16:04 Clear-3d

following

Apr 18 '23 09:04 nudlesoup

@huxycn box_nms_thresh and crop_nms_thresh, how to set these so that i get only one mask and one bbox (no duplicate), so if set 0.5 it removes any overlap of more than 50 percent or less than 50?

Apr 19 '23 01:04 maheshs11

+1 on this post. i am getting different results too. The one on the web has way much better result. It would be great to know the existing parameters. (or any additional processing)

Additionally, generation can be automatically run on crops of the image to get improved performance on smaller objects, and post-processing can remove stray pixels and holes.

Wondering what is being done there...

Apr 25 '23 09:04 yong2khoo-lm

+1 following

Apr 27 '23 10:04 Akhp888

I tested 200 images, out of which 15 had poor segmentation results, but the online demo test results were excellent!!!, I really want to know why

Apr 28 '23 01:04 dongjielie

Same here

Apr 28 '23 02:04 reconlabs-young

+1 following

Apr 28 '23 16:04 helen1c

+1 following

May 04 '23 02:05 HettyPatel

+1 following

May 04 '23 09:05 chava100

I believe the inference done in the demo is on the onnx quantized model. when I ran examples an the quantized onnx model results improved significantly I don't know why it is so, but maybe this can help you.

May 04 '23 11:05 chava100

Hi @chava100 would you mind posting some screenshots, I know a lot of people would be interested.

May 04 '23 14:05 Jordan-Pierce

I agree, even with the "basic" sam prediction with clicks to segment only one object, the demo shows much better results than running it with default values. It would be great to have the parameters in the demo! Thanks

May 04 '23 16:05 theodu

Hi @chava100 would you mind posting some screenshots, I know a lot of people would be interested.

Unfortunately I cannot share images from the dataset I tested on, so I tried to reproduce the results an a different example. The example I have is with a b_box prompt because I couldn't figure out how to do 'segment anything' when the model is exported to onnx. A capture that shows results from the SAM pytorch model:

A capture that shows results from SAM exported to onnx_quantized model:

This is a capture from the demo and I cannot guarantee that the values of the b_box are exactly the same as in the other two images because I do not see how numerical values can be entered but I tried to do it as close as possible: hope it helps.

May 04 '23 21:05 chava100

I've been grappling with the same issue for the past few days, and while I don't have a solution, I made some progress on identifying the issue.

I believe the SAM model in the repo is the same as the web model, but the vit-h image encoder has slightly different weights.

Here is a mask created from an embedding taken from the web example (copied out of the console), using the web SAM onnx model: download

And now the EXACT SAME SAM model, but with an image embedding created from the vit-h model specified in the repo: download

The strange part is that the odd 4x4 repeating grid pattern DOES appear in the mask from the web embedding, but only in the middle of the mask (near the bottom), never at the edges.

Directly comparing the image embeddings is strange too, this is from the web model (mapping values -1 to 1 to 0 to 255 rgb):

And from the vit-h model provided in the repo:

At first it looks like the difference is just the scaling (the web model has values closer to 0), but this isn't true in all cases. In one section the padding becomes entirely black, which I could not replicated no matter what color of padding I used (I tested white, gray, and black). I spent a while trying to make the embedding match via scaling, offset, normalization, etc, but I couldn't get it to work.

Given the superb quality of the mask created from the web embedding (which is literally pixel perfect, in contrast to the other very messy mask), I assume there isn't a trivial fix, and the web demo is simply using a heavily retrained vit-h model.

Jul 07 '23 08:07 sliftist

Also of note, quantizing the vit-h model gives pretty much the same mask result:

Jul 07 '23 08:07 sliftist

@sliftist I also often notice similar artifacts at the edges produced by the vit-h model. When using non-natural images (like 3D anime snapshots), these artifacts can sometimes become quite messy, even extending far from the edge with the vit-h model. However, the demo results look much cleaner in comparison.

Thank you also for the in-depth analysis using different feature maps with the same decoder model. It convincingly shows that either the model used for the demo (which appears to be better) has not been released, or the input image was preprocessed somehow.

Jul 16 '23 16:07 YutingZhang

following +1

Jul 30 '23 13:07 idonahum1

Why do I get this result using onnx_quantized model? @chava100 output output2

Aug 13 '23 03:08 sssmallmonster

following +1

Aug 21 '23 15:08 liren2515

Why do I get this result using onnx_quantized model? @chava100

Hi there! I've met the same problem, do you know what causes the shifting/offset? Screenshot 2023-09-28 at 10 25 40 Update: It's because I used a onnx model that is traced on 3:2 resolution, and applied it for 16:9 images.

Sep 28 '23 17:09 jiangwei221

segment-anything segment-anything copied to clipboard

Why do the results of the same image differ?

segment-anything
segment-anything copied to clipboard