InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

great work. But inference is too slow

Open bks5881 opened this issue 10 months ago • 5 comments

Great work with this. The OCR is on latin text really good. Can you propose how to run this faster. i am already running it on a100 79GB using transformers, but its really slow.Can I maybe use something like vllm/sglang or TGi(huggingface) to make the inference faster?

bks5881 avatar Apr 23 '24 13:04 bks5881

Did you find something? I'm looking for same

Iven2132 avatar Apr 26 '24 17:04 Iven2132

Hello, we also very much hope to make this model run faster and have been attempting some improvements in this area recently. I will keep you updated on any progress.

czczup avatar Apr 26 '24 17:04 czczup

Hello, we also very much hope to make this model run faster and have been attempting some improvements in this area recently. I will keep you updated on any progress.

@czczup How to run this model as inference? I can't find it on vLLM or TGI. Any suggestions how to deploy it as an endpoint

Iven2132 avatar Apr 26 '24 17:04 Iven2132

@Iven2132 I created a fastapi + uvicorn endpoint for now.

bks5881 avatar Apr 28 '24 11:04 bks5881

Hope the guide in this PR can do a favor

lvhan028 avatar May 07 '24 13:05 lvhan028