MobileSAM
MobileSAM copied to clipboard
It took 3000ms in my conputer
It took 3000ms in my conputer,I don't know what is wrong
Me too,It's even slower than Sam。 MyCode
def show_anns(anns):
if len(anns) == 0:
return
sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
ax = plt.gca()
ax.set_autoscale_on(False)
img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
img[:, :, 3] = 0
for ann in sorted_anns:
m = ann['segmentation']
color_mask = np.concatenate([np.random.random(3), [1]])
img[m] = color_mask
ax.imshow(img)
def runSam(path):
sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
device = "cuda"
sam.to(device)
mask_generator = SamAutomaticMaskGenerator(sam)
img = cv.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
masks = mask_generator.generate(img)
return masks, img
def runMobileSam(path):
from mobile_encoder.setup_mobile_sam import setup_model
checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
mobile_sam = setup_model()
mobile_sam.load_state_dict(checkpoint, strict=True)
from segment_anything import SamAutomaticMaskGenerator
mask_generator = SamAutomaticMaskGenerator(mobile_sam)
img = cv.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
masks = mask_generator.generate(img)
return masks, img
def showRet(masks, img):
print(len(masks))
print(masks[0].keys())
plt.figure(figsize=(20, 20))
plt.imshow(img)
show_anns(masks)
plt.axis('off')
plt.show()
if __name__ == '__main__':
path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
masks, img = runSam(path)
# masks, img = runMobileSam(path)
showRet(masks, img)
my environment windows , pytorch 2.0.1, cuda 11.7, 4070
Me too,It's even slower than Sam。 MyCode
def show_anns(anns): if len(anns) == 0: return sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True) ax = plt.gca() ax.set_autoscale_on(False) img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4)) img[:, :, 3] = 0 for ann in sorted_anns: m = ann['segmentation'] color_mask = np.concatenate([np.random.random(3), [1]]) img[m] = color_mask ax.imshow(img) def runSam(path): sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth") device = "cuda" sam.to(device) mask_generator = SamAutomaticMaskGenerator(sam) img = cv.imread(path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) masks = mask_generator.generate(img) return masks, img def runMobileSam(path): from mobile_encoder.setup_mobile_sam import setup_model checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt') mobile_sam = setup_model() mobile_sam.load_state_dict(checkpoint, strict=True) from segment_anything import SamAutomaticMaskGenerator mask_generator = SamAutomaticMaskGenerator(mobile_sam) img = cv.imread(path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) masks = mask_generator.generate(img) return masks, img def showRet(masks, img): print(len(masks)) print(masks[0].keys()) plt.figure(figsize=(20, 20)) plt.imshow(img) show_anns(masks) plt.axis('off') plt.show() if __name__ == '__main__': path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg' masks, img = runSam(path) # masks, img = runMobileSam(path) showRet(masks, img)
my environment windows , pytorch 2.0.1, cuda 11.7, 4070
It seems that your mobilesam does not use cuda, but sam used。
It took 3000ms in my conputer,I don't know what is wrong
Without more details, it is difficult for us to help you debug
Me too,It's even slower than Sam。 MyCode
def show_anns(anns): if len(anns) == 0: return sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True) ax = plt.gca() ax.set_autoscale_on(False) img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4)) img[:, :, 3] = 0 for ann in sorted_anns: m = ann['segmentation'] color_mask = np.concatenate([np.random.random(3), [1]]) img[m] = color_mask ax.imshow(img) def runSam(path): sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth") device = "cuda" sam.to(device) mask_generator = SamAutomaticMaskGenerator(sam) img = cv.imread(path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) masks = mask_generator.generate(img) return masks, img def runMobileSam(path): from mobile_encoder.setup_mobile_sam import setup_model checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt') mobile_sam = setup_model() mobile_sam.load_state_dict(checkpoint, strict=True) from segment_anything import SamAutomaticMaskGenerator mask_generator = SamAutomaticMaskGenerator(mobile_sam) img = cv.imread(path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) masks = mask_generator.generate(img) return masks, img def showRet(masks, img): print(len(masks)) print(masks[0].keys()) plt.figure(figsize=(20, 20)) plt.imshow(img) show_anns(masks) plt.axis('off') plt.show() if __name__ == '__main__': path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg' masks, img = runSam(path) # masks, img = runMobileSam(path) showRet(masks, img)
my environment windows , pytorch 2.0.1, cuda 11.7, 4070
It seems that your mobilesam does not use cuda, but sam used。
Thank you, you are right. Now my time is half that of Sam
It took 3000ms in my conputer,I don't know what is wrong
Without more details, it is difficult for us to help you debug
here is my code ,my environment: ubantu18 torch=2.0.0cu117
Me too,It's even slower than Sam。 MyCode
def show_anns(anns): if len(anns) == 0: return sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True) ax = plt.gca() ax.set_autoscale_on(False) img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4)) img[:, :, 3] = 0 for ann in sorted_anns: m = ann['segmentation'] color_mask = np.concatenate([np.random.random(3), [1]]) img[m] = color_mask ax.imshow(img) def runSam(path): sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth") device = "cuda" sam.to(device) mask_generator = SamAutomaticMaskGenerator(sam) img = cv.imread(path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) masks = mask_generator.generate(img) return masks, img def runMobileSam(path): from mobile_encoder.setup_mobile_sam import setup_model checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt') mobile_sam = setup_model() mobile_sam.load_state_dict(checkpoint, strict=True) from segment_anything import SamAutomaticMaskGenerator mask_generator = SamAutomaticMaskGenerator(mobile_sam) img = cv.imread(path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) masks = mask_generator.generate(img) return masks, img def showRet(masks, img): print(len(masks)) print(masks[0].keys()) plt.figure(figsize=(20, 20)) plt.imshow(img) show_anns(masks) plt.axis('off') plt.show() if __name__ == '__main__': path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg' masks, img = runSam(path) # masks, img = runMobileSam(path) showRet(masks, img)
my environment windows , pytorch 2.0.1, cuda 11.7, 4070
It seems that your mobilesam does not use cuda, but sam used。
Thank you, you are right. Now my time is half that of Sam
It seems that your issues are addressed. Thanks for your interest in our work.
It took 3000ms in my conputer,I don't know what is wrong
Without more details, it is difficult for us to help you debug
here is my code ,my environment: ubantu18 torch=2.0.0cu117
May I ask you whether you choose anything mode or everything mode?
Even after adding the following in the code
device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.
SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])
fastSAM is fine,I run the code of notebook
Even after adding the following in the code
device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.
SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])
May I ask you whether you choose anything mode or everything mode?
Even after adding the following in the code
device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])
May I ask you whether you choose anything mode or everything mode?
I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.
Even after adding the following in the code
device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])
May I ask you whether you choose anything mode or everything mode?
I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.
Thanks for your interest in our work. Note that MobileSAM makes the image encoder lightweight without changing the decoder (like 8ms on the encoder and 4ms on the decoder). Since we mainly target the anything mode (1 times image encoder and 1 times decoder) instead of everything mode (1 times image encoder and 32x32 times decoder), see the paper for definition difference (Anything mode is the foundation task while everything mode is just a downstream task as indicated in the original SAM paper). For everything mode, even though our encoder is much faster than that of the original SAM(close to 500ms), it cannot save too much time for the whole pipeline since most of the time is spent on the 32x32 times decoder. One way to mitigate this is to use smaller number of grids (like 10x10 or 5x5) to make the decoder consume less time, since many redundant masks are generated in the case of 32x32 grids. I hope this addresses your issues, otherwise, please kindly let us know. We are also currently trying to make the image decoder more lightweight by distilling it with smaller one as we did for image encoder. Stayed tuned for our progress. If you have more issues, please kindly let us know and we might not be able to respond in a timely manner, but will try our best.
This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.
This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.
Thanks for your interest in our work. Please check our replies to others on how to mitigate this issue. Yet another way to speed it up on GPU is to do a batch inference for the decoder with 32*32 grids of prompt points. You can try implementing it and help do a pull request here, if you complete it. We will also implement it by ourselves but it take a while~~