segment-anything
segment-anything copied to clipboard
Why do the results of the same image differ?
Hi, I try to find all the objects in the image automatically. I used below code.
import numpy as np
import torch
import matplotlib.pyplot as plt
import cv2
import glob
def show_anns(anns,save_path):
if len(anns) == 0:
print(save_path)
return
sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
ax = plt.gca()
ax.set_autoscale_on(False)
polygons = []
color = []
for ann in sorted_anns:
m = ann['segmentation']
img = np.ones((m.shape[0], m.shape[1], 3))
color_mask = np.random.random((1, 3)).tolist()[0]
for i in range(3):
img[:,:,i] = color_mask[i]
ax.imshow(np.dstack((img, m*0.35)))
plt.savefig(save_path)
import sys
sys.path.append("..")
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
sam_checkpoint = "../sam_vit_h_4b8939.pth"
model_type = "vit_h"
device = "cuda"
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)
mask_generator = SamAutomaticMaskGenerator(sam)
files = glob.glob(fr"./*.jpg")
idx = 0
for file in files:
image = cv2.imread(file)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
print(f"image.shape:{image.shape}")
plt.clf()
plt.subplot(1,2,1)
plt.imshow(image)
plt.subplot(1,2,2)
plt.imshow(image)
masks = mask_generator.generate(image)
print(fr"masks:{len(masks)}")
show_anns(masks,fr"{idx}.png")
idx += 1
However, I got this result:
But the demo's effect is very good, you can see the below result
My biggest question is why the online effect is so good!!! Have you used any other methods?
I remarked that also !!!!!!.
I think that there is a pre-processing for the images which we did not perform.
There's also a number of parameters when initializing the model; you're currently using all of the default values. I agree that it would be nice to know what parameters are used in the online demo @HannaMao
Pay attention to the show_anns() function, the line 'color_mask = np.random.random((1, 3)).tolist()[0]' may result in the difference in the same input, but I also don't know how to handle this problem.
The situation you mentioned may arise. But I think the probability of the same numerical value is very low. @LedKashmir
Hope to know what parameters are used in the online demo @HannaMao.
Pay attention to the show_anns() function, the line 'color_mask = np.random.random((1, 3)).tolist()[0]' may result in the difference in the same input, but I also don't know how to handle this problem.
That's definitely a possibility, but you can see in the example image provided above, the parent bear's ears are segmented separately than in the online demo, so it is giving different results (different parameters than the API's default)
there are a lot of parameters can be tuned in SamAutomaticMaskGenerator
mask_generator = SamAutomaticMaskGenerator(
# model: Sam,
# points_per_side: Optional[int] = 32,
# points_per_batch: int = 64,
# pred_iou_thresh: float = 0.88,
# stability_score_thresh: float = 0.95,
# stability_score_offset: float = 1.0,
# box_nms_thresh: float = 0.7,
# crop_n_layers: int = 0,
# crop_nms_thresh: float = 0.7,
# crop_overlap_ratio: float = 512 / 1500,
# crop_n_points_downscale_factor: int = 1,
# point_grids: Optional[List[np.ndarray]] = None,
# min_mask_region_area: int = 0,
# output_mode: str = "binary_mask",
model=sam,
points_per_side=32,
points_per_batch=64,
pred_iou_thresh=0.86,
stability_score_thresh=0.92,
box_nms_thresh=0.5,
# crop_n_layers=1,
# crop_n_points_downscale_factor=2,
min_mask_region_area=500,
)
box_nms_thresh: can remove duplicate mask by their bounding box iou crop_n_layers=1, and crop_n_points_downscale_factor=2, can get you finer results because it use multi crops to extract features and decode masks min_mask_resion_area can remove "holes" and "islands" attached to every single mask
@huxycn have you seen an improvements doing any sort of image pre-processing? Obviously speed if resizing the image, but I've tried sharpening the image and that seems to help a little.
same question! Is there any solution
Same question! The results can vary wildly.
following
@huxycn box_nms_thresh and crop_nms_thresh, how to set these so that i get only one mask and one bbox (no duplicate), so if set 0.5 it removes any overlap of more than 50 percent or less than 50?
+1 on this post. i am getting different results too. The one on the web has way much better result. It would be great to know the existing parameters. (or any additional processing)
Additionally, generation can be automatically run on crops of the image to get improved performance on smaller objects, and post-processing can remove stray pixels and holes.
Wondering what is being done there...
+1 following
I tested 200 images, out of which 15 had poor segmentation results, but the online demo test results were excellent!!!, I really want to know why
Same here
+1 following
+1 following
+1 following
I believe the inference done in the demo is on the onnx quantized model. when I ran examples an the quantized onnx model results improved significantly I don't know why it is so, but maybe this can help you.
Hi @chava100 would you mind posting some screenshots, I know a lot of people would be interested.
I agree, even with the "basic" sam prediction with clicks to segment only one object, the demo shows much better results than running it with default values. It would be great to have the parameters in the demo! Thanks
Hi @chava100 would you mind posting some screenshots, I know a lot of people would be interested.
Unfortunately I cannot share images from the dataset I tested on, so I tried to reproduce the results an a different example.
The example I have is with a b_box prompt because I couldn't figure out how to do 'segment anything' when the model is exported to onnx.
A capture that shows results from the SAM pytorch model:
A capture that shows results from SAM exported to onnx_quantized model:
This is a capture from the demo and I cannot guarantee that the values of the b_box are exactly the same as in the other two images because I do not see how numerical values can be entered but I tried to do it as close as possible:
hope it helps.
I've been grappling with the same issue for the past few days, and while I don't have a solution, I made some progress on identifying the issue.
I believe the SAM model in the repo is the same as the web model, but the vit-h image encoder has slightly different weights.
Here is a mask created from an embedding taken from the web example (copied out of the console), using the web SAM onnx model:
And now the EXACT SAME SAM model, but with an image embedding created from the vit-h model specified in the repo:
The strange part is that the odd 4x4 repeating grid pattern DOES appear in the mask from the web embedding, but only in the middle of the mask (near the bottom), never at the edges.
Directly comparing the image embeddings is strange too, this is from the web model (mapping values -1 to 1 to 0 to 255 rgb):
And from the vit-h model provided in the repo:
At first it looks like the difference is just the scaling (the web model has values closer to 0), but this isn't true in all cases. In one section the padding becomes entirely black, which I could not replicated no matter what color of padding I used (I tested white, gray, and black). I spent a while trying to make the embedding match via scaling, offset, normalization, etc, but I couldn't get it to work.
Given the superb quality of the mask created from the web embedding (which is literally pixel perfect, in contrast to the other very messy mask), I assume there isn't a trivial fix, and the web demo is simply using a heavily retrained vit-h model.
Also of note, quantizing the vit-h model gives pretty much the same mask result:
@sliftist I also often notice similar artifacts at the edges produced by the vit-h model. When using non-natural images (like 3D anime snapshots), these artifacts can sometimes become quite messy, even extending far from the edge with the vit-h model. However, the demo results look much cleaner in comparison.
Thank you also for the in-depth analysis using different feature maps with the same decoder model. It convincingly shows that either the model used for the demo (which appears to be better) has not been released, or the input image was preprocessed somehow.
following +1
Why do I get this result using onnx_quantized model? @chava100
following +1
Why do I get this result using onnx_quantized model? @chava100
![]()
Hi there!
I've met the same problem, do you know what causes the shifting/offset?
Update:
It's because I used a onnx model that is traced on 3:2 resolution, and applied it for 16:9 images.