CLIP About the unexpected inference latency.

trafficstars

According to the paper's details, the inference speed of clip ViT-B/32 is a bit of faster than ResNet50， but the offline test of pure clip model (only encode_image part) is much slower than expected. Is there any special instructions？

Jul 04 '21 07:07 xiao2mo

clip pytorch cuda 34.495944916584236 images/s @ batch size 1 28.9889145642519 ms/image efficientb3 pytorch cuda 34.13045597174209 images/s @ batch size 1 29.29934486746788 ms/image resnet50 pytorch cuda 80.65643866379489 images/s @ batch size 1 12.398266233503819 ms/image

according to VIT original paper's latency test, VIT-B/32 should be resnet50-like. What pulls it back to efficientb3-level. I have reviewed the implementation in your repo and viewed the model structure, the difference mainly lie in an additional layernorm and a conv layer compared to the original implementation, which should not account for the unexpected latency. Is there any mistake I have made? (⊙︿⊙)

Jul 04 '21 19:07 xiao2mo

What's more, I have totally remove the text transformer from the original model. By the way, the endless type() and to() operations seems unfriendly and may cause some problems for server deployment......

Jul 04 '21 19:07 xiao2mo

Really look forward for your reply. Thanks in advance. ^_^

Jul 04 '21 19:07 xiao2mo

The paper reports FLOPs during a forward pass, and we used fvcore's flop counting tool to get those numbers. The actual wall time might depend on various factors such as the GPU type, CuDNN and torch implementations, tensor data formats, etc.

Jul 05 '21 01:07 jongwook

The paper reports FLOPs during a forward pass, and we used fvcore's flop counting tool to get those numbers. The actual wall time might depend on various factors such as the GPU type, CuDNN and torch implementations, tensor data formats, etc.

Thanks. This is my envs: TensorRT Version: 7.2.1.6 NVIDIA GPU: T4 NVIDIA Driver Version: 450.102 CUDA Version: 11.0 CUDNN Version: 8.0.5 Python Version (if applicable): 3.7.7 PyTorch Version (if applicable): 1.7.1

Have you noticed the inference speed mentioned in the original paper of Vit? ( https://arxiv.org/pdf/2010.11929.pdf on page 19 ) But I haven't tested it on my own. I'll do it later today. But inference speed is too large to account for envs. By the way, have you tested the latency of your released model?

Jul 05 '21 03:07 xiao2mo

Hi. Anybody who can help me out.

Jul 06 '21 17:07 xiao2mo

Hi. Anybody who can help me out.

I have the same problem, infer is too slow

Jul 17 '21 04:07 jxa124

try to load the model with JIT and use

with torch.no_grad():
    model.encode_image(...)

Or maybe you already use that ... sorry if this does not help I was wondering though when we can see this for CLIP.

Dec 09 '21 15:12 hfawaz

for faster inference, please use https://github.com/jina-ai/clip-as-service/

Apr 10 '22 19:04 hanxiao

for faster inference, please use https://github.com/jina-ai/clip-as-service/

is there a way to use the clip-service from inside python? I mean not having to run a separate python server process?

Sep 09 '22 15:09 manugarri

CLIP CLIP copied to clipboard

About the unexpected inference latency.

CLIP
CLIP copied to clipboard