CLIP
CLIP copied to clipboard
About the unexpected inference latency.
According to the paper's details, the inference speed of clip ViT-B/32 is a bit of faster than ResNet50, but the offline test of pure clip model (only encode_image part) is much slower than expected. Is there any special instructions?
clip pytorch cuda 34.495944916584236 images/s @ batch size 1 28.9889145642519 ms/image efficientb3 pytorch cuda 34.13045597174209 images/s @ batch size 1 29.29934486746788 ms/image resnet50 pytorch cuda 80.65643866379489 images/s @ batch size 1 12.398266233503819 ms/image
according to VIT original paper's latency test, VIT-B/32 should be resnet50-like. What pulls it back to efficientb3-level. I have reviewed the implementation in your repo and viewed the model structure, the difference mainly lie in an additional layernorm and a conv layer compared to the original implementation, which should not account for the unexpected latency. Is there any mistake I have made? (⊙︿⊙)
What's more, I have totally remove the text transformer from the original model. By the way, the endless type() and to() operations seems unfriendly and may cause some problems for server deployment......
Really look forward for your reply. Thanks in advance. ^_^
The paper reports FLOPs during a forward pass, and we used fvcore's flop counting tool to get those numbers. The actual wall time might depend on various factors such as the GPU type, CuDNN and torch implementations, tensor data formats, etc.
The paper reports FLOPs during a forward pass, and we used fvcore's flop counting tool to get those numbers. The actual wall time might depend on various factors such as the GPU type, CuDNN and torch implementations, tensor data formats, etc.
Thanks. This is my envs: TensorRT Version: 7.2.1.6 NVIDIA GPU: T4 NVIDIA Driver Version: 450.102 CUDA Version: 11.0 CUDNN Version: 8.0.5 Python Version (if applicable): 3.7.7 PyTorch Version (if applicable): 1.7.1
Have you noticed the inference speed mentioned in the original paper of Vit? ( https://arxiv.org/pdf/2010.11929.pdf on page 19 ) But I haven't tested it on my own. I'll do it later today. But inference speed is too large to account for envs. By the way, have you tested the latency of your released model?
Hi. Anybody who can help me out.
Hi. Anybody who can help me out.
I have the same problem, infer is too slow
try to load the model with JIT and use
with torch.no_grad():
model.encode_image(...)
Or maybe you already use that ... sorry if this does not help I was wondering though when we can see this for CLIP.
for faster inference, please use https://github.com/jina-ai/clip-as-service/
for faster inference, please use https://github.com/jina-ai/clip-as-service/
is there a way to use the clip-service from inside python? I mean not having to run a separate python server process?