segment-anything
segment-anything copied to clipboard
How long does it take to generate masks?
For a 1702x1276 RGB image, it takes 6+ seconds to generate the masks using the default parameters.
- OS: Win 11
- GPU: NVDIA 3070 8GB
Is there a way to speed up?
Seeing similar performance ~ it takes ~5.6 seconds to generate the masks using the default parameters.
- OS: Ubuntu 18.04.6 LTS x86_64
- GPU: NVIDIA GeForce RTX 3090
Ditto; I get the same exact results (with 640x480 images). Moving from a 2080 to a 3080 doesn't change anything.
If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow.
The main function I was timing was using the SamAutomaticMaskGenerator class, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image.
When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.
If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow.
The main function I was timing was using the
SamAutomaticMaskGeneratorclass, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image.When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.
Actually, I prefer to run it without any prompts.
Ah, I see. Automatic mask generation is what I was using. I tested the other (predict) notebook and it takes ~2 seconds to compute the image embeddings (set_image) and another 2 seconds to do a predict (in my case I have an RTX 2080 on a linux intel server w/ 8 cores and 64GB RAM). I get similar results on my windows machine (3080 w/ 16 cores and 64GB RAM).
Since it takes 2-3 seconds just to generate the image embeddings I can't use this currently because I'm trying to do segmentation at 10FPS. I do have prompts (input points) but I'd still have to compute the embedding once per frame.
Same. Results are great but currently too slow to be of practical use for me. Hopefully someone figures out how to speed this up.
The FAQ on the segment anything website says embedding takes 0.15 seconds on an A100- which would be roughly 10X faster than my 3080 (I'm skeptical).
Maybe can change points_per_side in SamAutomaticMaskGenerator for more faster infer time, but the result will be drop down compare to default setting.
It seems the SamAutomaticMaskGenerator does a lot of deep copy in MaskData.cat that cause most of the slowdown.
Apparently these deep copies could be avoided since every time this method is called, the original object is deleted anyway.
It seems the
SamAutomaticMaskGeneratordoes a lot of deep copy in MaskData.cat that cause most of the slowdown. Apparently these deep copies could be avoided since every time this method is called, the original object is deleted anyway.
Hello,
Did you try it ? (if yes did it improved the inference time ?)
your getting 6 seconds or 50ms!?!?! I am getting terrible times on my local system
Nvidia GeForce RTX 3070 Ti
(1030, 1660, 3)
Execution time: 27.502898454666138 seconds
Using the dog.jpg, the different kinds of mask_gene functions in demo take 3.4 and 5.4 seconds respectively on NVIDIA A100-SXM4-80GB.
your getting 6 seconds or 50ms!?!?! I am getting terrible times on my local system
Nvidia GeForce RTX 3070 Ti
(1030, 1660, 3) Execution time: 27.502898454666138 seconds
6 seconds.
NVIDIA A100-SXM4-80GB
So, my local speed is relatively normal than? OP seems to have a similar setup as me and is getting 6 seconds. What am I miss understanding?
hi, we have proposed a method for rapid 'segment anything', using just 2% of the SA-1B dataset. It achieves precision comparable to SAM in edge detection (AP, .794 vs .793) and proposal generation tasks (mask AR@1000, 49.7 vs 51.8. E32). Additionally, our model is 50 times faster than SAM-H E32. The model is very simple, primarily adopting the yolov8seg structure. We welcome everyone to try it out, github: https://github.com/CASIA-IVA-Lab/FastSAM, arxiv: https://arxiv.org/pdf/2306.12156.pdf
Same. Results are great but currently too slow to be of practical use for me. Hopefully someone figures out how to speed this up.
Is there a way to speed up now? I encountered the same problem, and I also tested mobileSAM, but there was no significant improvement in time. It seems that the time-consuming part is not caused by the SAM model.
Same. Results are great but currently too slow to be of practical use for me. Hopefully someone figures out how to speed this up.
Is there a way to speed up now? I encountered the same problem, and I also tested mobileSAM, but there was no significant improvement in time. It seems that the time-consuming part is not caused by the SAM model.
MobileSAM replaces the heavyweight ViT-based image encoder of the original SAM with a lightweight one, while its prompt encoder and mask decoder are identical to those of SAM. Thus, MobileSAM is faster than SAM only in the set_image() step.
If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow. The main function I was timing was using the
SamAutomaticMaskGeneratorclass, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image. When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.Actually, I prefer to run it without any prompts.
I think you cannot run it without prompts, can you?
If you're following the "Automatic Mask Generation Example", I think I figured out why the execution time is so slow. The main function I was timing was using the
SamAutomaticMaskGeneratorclass, which is provided for convenience by the repo. It's inefficient b/c it calculates embeddings for each proposal region from a given image. When timing just the feature extraction step, we see ~500ms execution time. Then for each “prompt” to get a mask for a particular region, execution time is ~50ms.Actually, I prefer to run it without any prompts.
I think you cannot run it without prompts, can you?
I can, but it is TOO slow.
I can, but it is TOO slow.
yeah, but when you use the AMG you are using a grid of points as your prompts. If you use the default implementation you are using in fact 32*32 = 1,024 prompts. If you reduce the grid you might get faster results.
I can, but it is TOO slow.
yeah, but when you use the AMG you are using a grid of points as your prompts. If you use the default implementation you are using in fact 32*32 = 1,024 prompts. If you reduce the grid you might get faster results.
Right. Fewer points, less accuracy. It's hard to work out a suitable trade-off.