segment-anything icon indicating copy to clipboard operation
segment-anything copied to clipboard

How long does it take to generate masks?

Open luffy-yu opened this issue 2 years ago • 21 comments

For a 1702x1276 RGB image, it takes 6+ seconds to generate the masks using the default parameters.

  • OS: Win 11
  • GPU: NVDIA 3070 8GB

Is there a way to speed up?

luffy-yu avatar Apr 06 '23 19:04 luffy-yu

Seeing similar performance ~ it takes ~5.6 seconds to generate the masks using the default parameters.

  • OS: Ubuntu 18.04.6 LTS x86_64
  • GPU: NVIDIA GeForce RTX 3090

palol avatar Apr 06 '23 23:04 palol

Ditto; I get the same exact results (with 640x480 images). Moving from a 2080 to a 3080 doesn't change anything.

dakoner avatar Apr 07 '23 16:04 dakoner

If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow.

The main function I was timing was using the SamAutomaticMaskGenerator class, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image.

When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.

palol avatar Apr 07 '23 17:04 palol

If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow.

The main function I was timing was using the SamAutomaticMaskGenerator class, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image.

When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.

Actually, I prefer to run it without any prompts.

luffy-yu avatar Apr 07 '23 21:04 luffy-yu

Ah, I see. Automatic mask generation is what I was using. I tested the other (predict) notebook and it takes ~2 seconds to compute the image embeddings (set_image) and another 2 seconds to do a predict (in my case I have an RTX 2080 on a linux intel server w/ 8 cores and 64GB RAM). I get similar results on my windows machine (3080 w/ 16 cores and 64GB RAM).

Since it takes 2-3 seconds just to generate the image embeddings I can't use this currently because I'm trying to do segmentation at 10FPS. I do have prompts (input points) but I'd still have to compute the embedding once per frame.

dakoner avatar Apr 07 '23 23:04 dakoner

Same. Results are great but currently too slow to be of practical use for me. Hopefully someone figures out how to speed this up.

mpottinger avatar Apr 08 '23 09:04 mpottinger

The FAQ on the segment anything website says embedding takes 0.15 seconds on an A100- which would be roughly 10X faster than my 3080 (I'm skeptical).

dakoner avatar Apr 14 '23 20:04 dakoner

Maybe can change points_per_side in SamAutomaticMaskGenerator for more faster infer time, but the result will be drop down compare to default setting.

ken011528 avatar Apr 17 '23 10:04 ken011528

It seems the SamAutomaticMaskGenerator does a lot of deep copy in MaskData.cat that cause most of the slowdown. Apparently these deep copies could be avoided since every time this method is called, the original object is deleted anyway.

niberger avatar May 26 '23 08:05 niberger

It seems the SamAutomaticMaskGenerator does a lot of deep copy in MaskData.cat that cause most of the slowdown. Apparently these deep copies could be avoided since every time this method is called, the original object is deleted anyway.

Hello,

Did you try it ? (if yes did it improved the inference time ?)

alexcbb avatar Jun 02 '23 13:06 alexcbb

your getting 6 seconds or 50ms!?!?! I am getting terrible times on my local system

Nvidia GeForce RTX 3070 Ti

(1030, 1660, 3)
Execution time: 27.502898454666138 seconds

NigelHiggs30 avatar Jun 06 '23 23:06 NigelHiggs30

Using the dog.jpg, the different kinds of mask_gene functions in demo take 3.4 and 5.4 seconds respectively on NVIDIA A100-SXM4-80GB.

andyoung009 avatar Jun 11 '23 14:06 andyoung009

your getting 6 seconds or 50ms!?!?! I am getting terrible times on my local system

Nvidia GeForce RTX 3070 Ti

(1030, 1660, 3)
Execution time: 27.502898454666138 seconds

6 seconds.

luffy-yu avatar Jun 11 '23 14:06 luffy-yu

NVIDIA A100-SXM4-80GB

So, my local speed is relatively normal than? OP seems to have a similar setup as me and is getting 6 seconds. What am I miss understanding?

NigelHiggs30 avatar Jun 14 '23 06:06 NigelHiggs30

hi, we have proposed a method for rapid 'segment anything', using just 2% of the SA-1B dataset. It achieves precision comparable to SAM in edge detection (AP, .794 vs .793) and proposal generation tasks (mask AR@1000, 49.7 vs 51.8. E32). Additionally, our model is 50 times faster than SAM-H E32. The model is very simple, primarily adopting the yolov8seg structure. We welcome everyone to try it out, github: https://github.com/CASIA-IVA-Lab/FastSAM, arxiv: https://arxiv.org/pdf/2306.12156.pdf

berry-ding avatar Jun 22 '23 06:06 berry-ding

Same. Results are great but currently too slow to be of practical use for me. Hopefully someone figures out how to speed this up.

Is there a way to speed up now? I encountered the same problem, and I also tested mobileSAM, but there was no significant improvement in time. It seems that the time-consuming part is not caused by the SAM model.

DESEOUMAIGA avatar Oct 19 '23 02:10 DESEOUMAIGA

Same. Results are great but currently too slow to be of practical use for me. Hopefully someone figures out how to speed this up.

Is there a way to speed up now? I encountered the same problem, and I also tested mobileSAM, but there was no significant improvement in time. It seems that the time-consuming part is not caused by the SAM model.

MobileSAM replaces the heavyweight ViT-based image encoder of the original SAM with a lightweight one, while its prompt encoder and mask decoder are identical to those of SAM. Thus, MobileSAM is faster than SAM only in the set_image() step.

ZillaRU avatar Oct 19 '23 02:10 ZillaRU

If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow. The main function I was timing was using the SamAutomaticMaskGenerator class, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image. When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.

Actually, I prefer to run it without any prompts.

I think you cannot run it without prompts, can you?

KostadinovShalon avatar Feb 22 '24 16:02 KostadinovShalon

If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow. The main function I was timing was using the SamAutomaticMaskGenerator class, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image. When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.

Actually, I prefer to run it without any prompts.

I think you cannot run it without prompts, can you?

I can, but it is TOO slow.

luffy-yu avatar Feb 22 '24 16:02 luffy-yu

I can, but it is TOO slow.

yeah, but when you use the AMG you are using a grid of points as your prompts. If you use the default implementation you are using in fact 32*32 = 1,024 prompts. If you reduce the grid you might get faster results.

KostadinovShalon avatar Feb 22 '24 16:02 KostadinovShalon

I can, but it is TOO slow.

yeah, but when you use the AMG you are using a grid of points as your prompts. If you use the default implementation you are using in fact 32*32 = 1,024 prompts. If you reduce the grid you might get faster results.

Right. Fewer points, less accuracy. It's hard to work out a suitable trade-off.

luffy-yu avatar Feb 22 '24 16:02 luffy-yu