segment-anything latency in set

latency in set_image function

Open mhyeonsoo opened this issue 1 year ago • 4 comments

Hi,

Thanks for the great sources. I can see the amazing performances. When I implemented the code and tried to run the model, I could see that it takes long time at the 'set_image()' function.

I am just wondering if it is because of transformation functions is ImageEncoderViT method. If so, is it supposed to take relatively long time?

for me, the elapsed time for each task was,

DINO model loading: 0.829
ViT encoder process: 17.912 
SAM model prediction: 0.192

Thanks again,

Apr 12 '23 02:04 mhyeonsoo

What type of GPU are you using?

Apr 12 '23 07:04 nikhilaravi

Thanks for the excellent project, I have the same problem, the set_image function takes about 17s.

os: Ubuntu 20.04
gpu: Tesla t4 16G
model type: vit_h
CUDA Version: 11.4

thanks again

Apr 14 '23 06:04 liuzz07

Thanks for the excellent project. I have a question about the gpu memory of set_image function. When i load the model to the gpu device sam.to(device=device), i find it occupies 3403MiB of the GPU using vit_h model. But when i execute set_image, the gpu memory increased to 7573MiB. Not sure why picture vectors take up so much gpu memory?

Thanks for any help.

Apr 14 '23 08:04 obitoquilt

Same question for me. my device is ubuntu22 RTX3090 my code is

s4 = time.time()
input_label = np.array([1] * len(input_point), dtype=np.float32)
predictor.set_image(img)
print("set image time:", 1000*(time.time() - s4))
s5 = time.time()
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True,
)
print("segment time:", 1000*(time.time() - s5))

the command output is

set image time: 93.09029579162598
segment time: 15.57469367980957

using sam_vit_b_01ec64.pth

Apr 17 '23 04:04 KAWAKO-in-GAYHUB

@nikhilaravi, I am using Nvidia 3080Ti

Apr 18 '23 01:04 mhyeonsoo

@nikhilaravi, I am using Nvidia 3080Ti

How did you achieve the VIT encoder process of 17.912 ms on the 3080ti? Inference on my 3090 takes 93.09 ms😭 my CPU is i7-9700KF

Apr 18 '23 01:04 KAWAKO-in-GAYHUB

@KAWAKO-in-GAYHUB Hi did you run with the GPU? It seems like your code is running with CPU rather than GPU. check with the nvidia-smi during the running.

Apr 18 '23 01:04 mhyeonsoo

predictor.set_image(image) is the step generates image embedding

To use the ONNX model, the image must first be pre-processed using the SAM image encoder

- this is the main thread of model because it's a big backbone. After that, we could predict much more faster with our input points, box, mask P/s: What we actualy convert to onnx is the last few layers of sam_vit_h_4b8939.pth

Apr 18 '23 01:04 hungtooc

@hungtooc Thanks for your comment, For me, other processes including pre-processing and transofrmation does not take that much long, and most of the time is taken at the line of,

self.features = self.model.image_encoder(input_image)

I also would like to look over onnx conversion part. Thanks!

Apr 18 '23 02:04 mhyeonsoo

@KAWAKO-in-GAYHUB Hi did you run with the GPU? It seems like your code is running with CPU rather than GPU. check with the nvidia-smi during the running.

Thank you for your reply!

I'm sure my code is running on the GPU, and I wrote a demo on jupyter notebook to verify it.

I averaged 117.42 ms after running predictor.set_image(image) and predictor.predict(...) 1000 times. In addition, I also run predictor.set_image(image) 1000 times to get the average value, which is 106.37 ms.

I don't know what's wrong in my code.

Apr 18 '23 02:04 KAWAKO-in-GAYHUB

@KAWAKO-in-GAYHUB That may be correct. My time up there is sec, not ms. Sorry I didn't metion above.

Apr 18 '23 02:04 mhyeonsoo

@hungtooc Thanks for your comment, For me, other processes including pre-processing and transofrmation does not take that much long, and most of the time is taken at the line of,
self.features = self.model.image_encoder(input_image)

that's normal. you can see it mentioned in paper

A heavyweight image encoder outputs an image embedding

Apr 18 '23 02:04 hungtooc

@hungtooc Got it. I missed that sentence! I will read more details. Thanks for answering :)

Apr 18 '23 02:04 mhyeonsoo

@KAWAKO-in-GAYHUB That may be correct. My time up there is sec, not ms. Sorry I didn't metion above.

There shouldn't be such a big gap between the 3080ti and 3090 running code (17s and 90ms). I don't know what your code looks like.

Apr 18 '23 02:04 KAWAKO-in-GAYHUB

@hungtooc Thanks for your comment, For me, other processes including pre-processing and transofrmation does not take that much long, and most of the time is taken at the line of,
self.features = self.model.image_encoder(input_image)
that's normal. you can see it mentioned in paper

A heavyweight image encoder outputs an image embedding

So my understanding is that its application scenario is more suitable for multiple inference of one picture, rather than real-time inference of video frames. Right?

Apr 18 '23 02:04 KAWAKO-in-GAYHUB

@KAWAKO-in-GAYHUB I not sure, maybe he could help you https://github.com/facebookresearch/segment-anything/issues/107#issuecomment-1500909850

Apr 18 '23 02:04 hungtooc

@KAWAKO-in-GAYHUB I not sure, maybe he could help you #107 (comment)

awesome, thank you!

Apr 18 '23 02:04 KAWAKO-in-GAYHUB

hi, we have proposed a method for rapid 'segment anything', using just 2% of the SA-1B dataset. It achieves precision comparable to SAM in edge detection (AP, .794 vs .793) and proposal generation tasks (mask AR@1000, 49.7 vs 51.8. E32). Additionally, our model is 50 times faster than SAM-H E32. The model is very simple, primarily adopting the yolov8seg structure. We welcome everyone to try it out, github: https://github.com/CASIA-IVA-Lab/FastSAM, arxiv: https://arxiv.org/pdf/2306.12156.pdf

Jun 22 '23 06:06 berry-ding

segment-anything segment-anything copied to clipboard

latency in set_image function

segment-anything
segment-anything copied to clipboard