CLIP icon indicating copy to clipboard operation
CLIP copied to clipboard

About the unexpected inference latency.

Open xiao2mo opened this issue 4 years ago • 10 comments
trafficstars

According to the paper's details, the inference speed of clip ViT-B/32 is a bit of faster than ResNet50, but the offline test of pure clip model (only encode_image part) is much slower than expected. Is there any special instructions?

xiao2mo avatar Jul 04 '21 07:07 xiao2mo

clip pytorch cuda 34.495944916584236 images/s @ batch size 1 28.9889145642519 ms/image efficientb3 pytorch cuda 34.13045597174209 images/s @ batch size 1 29.29934486746788 ms/image resnet50 pytorch cuda 80.65643866379489 images/s @ batch size 1 12.398266233503819 ms/image

according to VIT original paper's latency test, VIT-B/32 should be resnet50-like. What pulls it back to efficientb3-level. I have reviewed the implementation in your repo and viewed the model structure, the difference mainly lie in an additional layernorm and a conv layer compared to the original implementation, which should not account for the unexpected latency. Is there any mistake I have made? (⊙︿⊙)

xiao2mo avatar Jul 04 '21 19:07 xiao2mo

What's more, I have totally remove the text transformer from the original model. By the way, the endless type() and to() operations seems unfriendly and may cause some problems for server deployment......

xiao2mo avatar Jul 04 '21 19:07 xiao2mo

Really look forward for your reply. Thanks in advance. ^_^

xiao2mo avatar Jul 04 '21 19:07 xiao2mo

The paper reports FLOPs during a forward pass, and we used fvcore's flop counting tool to get those numbers. The actual wall time might depend on various factors such as the GPU type, CuDNN and torch implementations, tensor data formats, etc.

jongwook avatar Jul 05 '21 01:07 jongwook

The paper reports FLOPs during a forward pass, and we used fvcore's flop counting tool to get those numbers. The actual wall time might depend on various factors such as the GPU type, CuDNN and torch implementations, tensor data formats, etc.

Thanks. This is my envs: TensorRT Version: 7.2.1.6 NVIDIA GPU: T4 NVIDIA Driver Version: 450.102 CUDA Version: 11.0 CUDNN Version: 8.0.5 Python Version (if applicable): 3.7.7 PyTorch Version (if applicable): 1.7.1

Have you noticed the inference speed mentioned in the original paper of Vit? ( https://arxiv.org/pdf/2010.11929.pdf on page 19 ) But I haven't tested it on my own. I'll do it later today. But inference speed is too large to account for envs. By the way, have you tested the latency of your released model?

xiao2mo avatar Jul 05 '21 03:07 xiao2mo

Hi. Anybody who can help me out.

xiao2mo avatar Jul 06 '21 17:07 xiao2mo

Hi. Anybody who can help me out.

I have the same problem, infer is too slow

jxa124 avatar Jul 17 '21 04:07 jxa124

try to load the model with JIT and use

with torch.no_grad():
    model.encode_image(...)

Or maybe you already use that ... sorry if this does not help I was wondering though when we can see this for CLIP.

hfawaz avatar Dec 09 '21 15:12 hfawaz

for faster inference, please use https://github.com/jina-ai/clip-as-service/

hanxiao avatar Apr 10 '22 19:04 hanxiao

for faster inference, please use https://github.com/jina-ai/clip-as-service/

is there a way to use the clip-service from inside python? I mean not having to run a separate python server process?

manugarri avatar Sep 09 '22 15:09 manugarri