VLMEvalKit icon indicating copy to clipboard operation
VLMEvalKit copied to clipboard

Does vlmeval support multi card inference and batch size > 1?

Open John-Ge opened this issue 1 year ago • 7 comments

Does vlmeval support multi card inference and batch size > 1?

John-Ge avatar Dec 28 '23 11:12 John-Ge

Hi, @John-Ge ,

  1. For simplicity reasons, VLMEvalKit do not support batch size > 1 inference for now.
  2. VLMEvalKit currently supports two types of multi-GPU inference: 1). DistributedDataParallel via torchrun, which run N VLM instances on N GPUs. It requires your VLM to be small enough and can run on a single GPU. 2). The model is configured by default to use multiple GPUs (like IDEFICS_80B_INSTRUCT). When you launch with python, it will automatically run on all available GPUs.

kennymckormick avatar Dec 29 '23 08:12 kennymckormick

Thanks for your relpy! I would like to know what is the normal format of inference with batch size > 1? Should we deploy the model though like, vllm or tgi? Do we need to wait for them to support llava?

John-Ge avatar Dec 29 '23 10:12 John-Ge

The authors of LLaVA have tried to create the beta-version of batch inference https://github.com/haotian-liu/LLaVA/issues/754

darkpromise98 avatar Jan 19 '24 12:01 darkpromise98

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

kennymckormick avatar Jan 20 '24 09:01 kennymckormick

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

That's great !

darkpromise98 avatar Jan 21 '24 02:01 darkpromise98

https://github.com/haotian-liu/LLaVA/issues/754#issuecomment-1907970439 this issue build a fast inference method for llava, would you add this function for every benchmark in this repo?

BTW, I find sglang may not support lora+base model. I train llava with lora. If possible, I hope you could support load base model and merge lora weights and deploy it for evaluation.

John-Ge avatar Jan 25 '24 13:01 John-Ge

Hi, @John-Ge @darkpromise98 , I have reviewed the request. I'm sorry that I may not implement this feature on my own for the following reasons:

  1. Currently, only few VLMs supports the batch_inference interface, adding it for LLaVA may lead to some major changes in the inference pipeline of VLMEvalKit.
  2. The inference of LLaVA is relatively fast: under batch_size=1, llava-v1.5-13b can run at 3~4 fps on a single A100. Thus I think batch_inference for LLaVA may not be a critical feature for VLMEvalKit.

BTW, I'm willing to review and merge it VLMEvalKit main branch if someone is willing to create a PR (might be relatively heavy) about it.

kennymckormick avatar Jan 27 '24 12:01 kennymckormick