segment-anything How long does it take to generate masks?

For a 1702x1276 RGB image, it takes 6+ seconds to generate the masks using the default parameters.

OS: Win 11
GPU: NVDIA 3070 8GB

Is there a way to speed up?

Apr 06 '23 19:04 luffy-yu

Seeing similar performance ~ it takes ~5.6 seconds to generate the masks using the default parameters.

OS: Ubuntu 18.04.6 LTS x86_64
GPU: NVIDIA GeForce RTX 3090

Apr 06 '23 23:04 palol

Ditto; I get the same exact results (with 640x480 images). Moving from a 2080 to a 3080 doesn't change anything.

Apr 07 '23 16:04 dakoner

If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow.

The main function I was timing was using the SamAutomaticMaskGenerator class, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image.

When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.

Apr 07 '23 17:04 palol

If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow.

The main function I was timing was using the SamAutomaticMaskGenerator class, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image.

When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.

Actually, I prefer to run it without any prompts.

Apr 07 '23 21:04 luffy-yu

Ah, I see. Automatic mask generation is what I was using. I tested the other (predict) notebook and it takes ~2 seconds to compute the image embeddings (set_image) and another 2 seconds to do a predict (in my case I have an RTX 2080 on a linux intel server w/ 8 cores and 64GB RAM). I get similar results on my windows machine (3080 w/ 16 cores and 64GB RAM).

Since it takes 2-3 seconds just to generate the image embeddings I can't use this currently because I'm trying to do segmentation at 10FPS. I do have prompts (input points) but I'd still have to compute the embedding once per frame.

Apr 07 '23 23:04 dakoner

Same. Results are great but currently too slow to be of practical use for me. Hopefully someone figures out how to speed this up.

Apr 08 '23 09:04 mpottinger

The FAQ on the segment anything website says embedding takes 0.15 seconds on an A100- which would be roughly 10X faster than my 3080 (I'm skeptical).

Apr 14 '23 20:04 dakoner

Maybe can change points_per_side in SamAutomaticMaskGenerator for more faster infer time, but the result will be drop down compare to default setting.

Apr 17 '23 10:04 ken011528

It seems the SamAutomaticMaskGenerator does a lot of deep copy in MaskData.cat that cause most of the slowdown. Apparently these deep copies could be avoided since every time this method is called, the original object is deleted anyway.

May 26 '23 08:05 niberger

It seems the SamAutomaticMaskGenerator does a lot of deep copy in MaskData.cat that cause most of the slowdown. Apparently these deep copies could be avoided since every time this method is called, the original object is deleted anyway.

Hello,

Did you try it ? (if yes did it improved the inference time ?)

Jun 02 '23 13:06 alexcbb

your getting 6 seconds or 50ms!?!?! I am getting terrible times on my local system

Nvidia GeForce RTX 3070 Ti

(1030, 1660, 3)
Execution time: 27.502898454666138 seconds

Jun 06 '23 23:06 NigelHiggs30

Using the dog.jpg, the different kinds of mask_gene functions in demo take 3.4 and 5.4 seconds respectively on NVIDIA A100-SXM4-80GB.

Jun 11 '23 14:06 andyoung009

your getting 6 seconds or 50ms!?!?! I am getting terrible times on my local system

Nvidia GeForce RTX 3070 Ti
(1030, 1660, 3)
Execution time: 27.502898454666138 seconds

6 seconds.

Jun 11 '23 14:06 luffy-yu

NVIDIA A100-SXM4-80GB

So, my local speed is relatively normal than? OP seems to have a similar setup as me and is getting 6 seconds. What am I miss understanding?

Jun 14 '23 06:06 NigelHiggs30

hi, we have proposed a method for rapid 'segment anything', using just 2% of the SA-1B dataset. It achieves precision comparable to SAM in edge detection (AP, .794 vs .793) and proposal generation tasks (mask AR@1000, 49.7 vs 51.8. E32). Additionally, our model is 50 times faster than SAM-H E32. The model is very simple, primarily adopting the yolov8seg structure. We welcome everyone to try it out, github: https://github.com/CASIA-IVA-Lab/FastSAM, arxiv: https://arxiv.org/pdf/2306.12156.pdf

Jun 22 '23 06:06 berry-ding

Same. Results are great but currently too slow to be of practical use for me. Hopefully someone figures out how to speed this up.

Is there a way to speed up now? I encountered the same problem, and I also tested mobileSAM, but there was no significant improvement in time. It seems that the time-consuming part is not caused by the SAM model.

Oct 19 '23 02:10 DESEOUMAIGA

Same. Results are great but currently too slow to be of practical use for me. Hopefully someone figures out how to speed this up.

Is there a way to speed up now? I encountered the same problem, and I also tested mobileSAM, but there was no significant improvement in time. It seems that the time-consuming part is not caused by the SAM model.

MobileSAM replaces the heavyweight ViT-based image encoder of the original SAM with a lightweight one, while its prompt encoder and mask decoder are identical to those of SAM. Thus, MobileSAM is faster than SAM only in the set_image() step.

Oct 19 '23 02:10 ZillaRU

If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow. The main function I was timing was using the SamAutomaticMaskGenerator class, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image. When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.

Actually, I prefer to run it without any prompts.

I think you cannot run it without prompts, can you?

Feb 22 '24 16:02 KostadinovShalon

If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow. The main function I was timing was using the SamAutomaticMaskGenerator class, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image. When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.

Actually, I prefer to run it without any prompts.

I think you cannot run it without prompts, can you?

I can, but it is TOO slow.

Feb 22 '24 16:02 luffy-yu

I can, but it is TOO slow.

yeah, but when you use the AMG you are using a grid of points as your prompts. If you use the default implementation you are using in fact 32*32 = 1,024 prompts. If you reduce the grid you might get faster results.

Feb 22 '24 16:02 KostadinovShalon

I can, but it is TOO slow.

yeah, but when you use the AMG you are using a grid of points as your prompts. If you use the default implementation you are using in fact 32*32 = 1,024 prompts. If you reduce the grid you might get faster results.

Right. Fewer points, less accuracy. It's hard to work out a suitable trade-off.

Feb 22 '24 16:02 luffy-yu

segment-anything segment-anything copied to clipboard

How long does it take to generate masks?

segment-anything
segment-anything copied to clipboard