VLMEvalKit Does vlmeval support multi card inference and batch size

Does vlmeval support multi card inference and batch size > 1?

Dec 28 '23 11:12 John-Ge

Hi, @John-Ge ,

For simplicity reasons, VLMEvalKit do not support batch size > 1 inference for now.
VLMEvalKit currently supports two types of multi-GPU inference: 1). DistributedDataParallel via torchrun, which run N VLM instances on N GPUs. It requires your VLM to be small enough and can run on a single GPU. 2). The model is configured by default to use multiple GPUs (like IDEFICS_80B_INSTRUCT). When you launch with python, it will automatically run on all available GPUs.

Dec 29 '23 08:12 kennymckormick

Thanks for your relpy! I would like to know what is the normal format of inference with batch size > 1? Should we deploy the model though like, vllm or tgi? Do we need to wait for them to support llava?

Dec 29 '23 10:12 John-Ge

The authors of LLaVA have tried to create the beta-version of batch inference https://github.com/haotian-liu/LLaVA/issues/754

Jan 19 '24 12:01 darkpromise98

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

Jan 20 '24 09:01 kennymckormick

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

That's great !

Jan 21 '24 02:01 darkpromise98

https://github.com/haotian-liu/LLaVA/issues/754#issuecomment-1907970439 this issue build a fast inference method for llava, would you add this function for every benchmark in this repo?

BTW, I find sglang may not support lora+base model. I train llava with lora. If possible, I hope you could support load base model and merge lora weights and deploy it for evaluation.

Jan 25 '24 13:01 John-Ge

Hi, @John-Ge @darkpromise98 , I have reviewed the request. I'm sorry that I may not implement this feature on my own for the following reasons:

Currently, only few VLMs supports the batch_inference interface, adding it for LLaVA may lead to some major changes in the inference pipeline of VLMEvalKit.
The inference of LLaVA is relatively fast: under batch_size=1, llava-v1.5-13b can run at 3~4 fps on a single A100. Thus I think batch_inference for LLaVA may not be a critical feature for VLMEvalKit.

BTW, I'm willing to review and merge it VLMEvalKit main branch if someone is willing to create a PR (might be relatively heavy) about it.

Jan 27 '24 12:01 kennymckormick

VLMEvalKit VLMEvalKit copied to clipboard

Does vlmeval support multi card inference and batch size > 1?

VLMEvalKit
VLMEvalKit copied to clipboard