InternVL great work. But inference is too slow

great work. But inference is too slow

Open bks5881 opened this issue 10 months ago • 5 comments

Great work with this. The OCR is on latin text really good. Can you propose how to run this faster. i am already running it on a100 79GB using transformers, but its really slow.Can I maybe use something like vllm/sglang or TGi(huggingface) to make the inference faster?

Apr 23 '24 13:04 bks5881

Did you find something? I'm looking for same

Apr 26 '24 17:04 Iven2132

Hello, we also very much hope to make this model run faster and have been attempting some improvements in this area recently. I will keep you updated on any progress.

Apr 26 '24 17:04 czczup

Hello, we also very much hope to make this model run faster and have been attempting some improvements in this area recently. I will keep you updated on any progress.

@czczup How to run this model as inference? I can't find it on vLLM or TGI. Any suggestions how to deploy it as an endpoint

Apr 26 '24 17:04 Iven2132

@Iven2132 I created a fastapi + uvicorn endpoint for now.

Apr 28 '24 11:04 bks5881

Hope the guide in this PR can do a favor

May 07 '24 13:05 lvhan028

InternVL InternVL copied to clipboard

great work. But inference is too slow

InternVL
InternVL copied to clipboard