lmdeploy [Feature] Support for LLaVA-NeXT Qwen1.5-110, Qwen1.5-72B, LLaMA3-8B

Motivation

It outperforms existing open-source models like Intern-VL-1.5

https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/#:~:text=Live%20Demo-,Benchmark%20Results,-Results%20with%20LMMs

Related resources

https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/

Additional context

No response

May 11 '24 06:05 Iven2132

请问现在是否已经支持LLaVA-NeXT Qwen1.5-110, Qwen1.5-72B, LLaMA3-8B了吗

May 13 '24 04:05 White-Friday

Yes. They are supported.

May 13 '24 05:05 lvhan028

It probably has problem when the model is large. #1563 explains the reason

May 13 '24 06:05 lvhan028

Yes. They are supported.

@lvhan028 So I can use the LLaVA-NeXT Qwen1.5-72B, LLaMA3-8B?

May 13 '24 10:05 Iven2132

Sorry, my bad. It probably needs to make some changes like PR #1579 does. I didn't find the checkpoints. Could you share the huggingface repo_id?

May 13 '24 11:05 lvhan028

Sorry, my bad. It probably needs to make some changes like PR #1579 does. I didn't find the checkpoints. Could you share the huggingface repo_id?

lmms-lab/llama3-llava-next-8b: https://huggingface.co/lmms-lab/llama3-llava-next-8b

lmms-lab/llava-next-72b: https://huggingface.co/lmms-lab/llava-next-72b

lmms-lab/llava-next-110b: https://huggingface.co/lmms-lab/llava-next-110b

Please add system prompt feature too

May 13 '24 14:05 Iven2132

Do you know what minimum GPU memory requirements would be to serve these VL models? Thanks!

May 13 '24 19:05 babla9

Do you know what minimum GPU memory requirements would be to serve these VL models? Thanks!

A100s are best if you want you can run these on Modal, they are giving $30 free credits

May 14 '24 04:05 Iven2132

Do you know what minimum GPU memory requirements would be to serve these VL models? Thanks!

A100s are best if you want you can run these on Modal, they are giving $30 free credits

Thanks, for LLaVA, do you know what the minimum configuration would be? eg 1 x A100 40gb? Would V100s work (eg 2xV100)? Anyway I can serve quantized models using V100s via lmdeploy?

May 15 '24 06:05 babla9

Do you know what minimum GPU memory requirements would be to serve these VL models? Thanks!

A100s are best if you want you can run these on Modal, they are giving $30 free credits

Thanks, for LLaVA, do you know what the minimum configuration would be? eg 1 x A100 40gb? Would V100s work (eg 2xV100)? Anyway I can serve quantized models using V100s via lmdeploy?

I'm not sure but you can run the llava-next 8b model with one A100

May 15 '24 07:05 Iven2132

@lvhan028 Will it support Llava-next?

May 16 '24 10:05 Iven2132

Regarding llava-next, models in https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2 have already been supported except https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b

We will support the following in June

lmms-lab/llama3-llava-next-8b: https://huggingface.co/lmms-lab/llama3-llava-next-8b
lmms-lab/llava-next-72b: https://huggingface.co/lmms-lab/llava-next-72b
lmms-lab/llava-next-110b: https://huggingface.co/lmms-lab/llava-next-110b

May 16 '24 12:05 lvhan028

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

May 24 '24 02:05 github-actions[bot]

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

May 29 '24 02:05 github-actions[bot]