xtuner
xtuner copied to clipboard
multi image inputs supports for xtuner chat in llava-llama3?
-
cmd:
xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test001.png
-
question: trained multimodal model can only input one image at one time , is there any method to support multi image & queries at one time? such as following:
double enter to end input (EXIT: exit chat, RESET: reset history) >>> **image input**: xxx/test.jpg or None
double enter to end input (EXIT: exit chat, RESET: reset history) >>> **query:** describe this images.
xxxxxxxxxxxxxx
double enter to end input (EXIT: exit chat, RESET: reset history) >>>
xtuner chat
is a simple command-line tool developed for analyzing training results.
If you want to chat with multi images, you can take advantage of inference tools such as ollama
and lmdeploy
.
https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf#chat-by-ollama https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf#chat-by-lmdeploy
@ztfmars hi ,when I do double enter after input: xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test.jpg.the following error occurs: double enter to end input (EXIT: exit chat, RESET: reset history) >>> what is this photo about?
Traceback (most recent call last):
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/chat.py", line 491, in
have you met ths same problem?