xtuner multi image inputs supports for xtuner chat in llava-llama3?

cmd: xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test001.png
question: trained multimodal model can only input one image at one time , is there any method to support multi image & queries at one time? such as following:

double enter to end input (EXIT: exit chat, RESET: reset history)  >>> **image input**:  xxx/test.jpg or None

double enter to end input (EXIT: exit chat, RESET: reset history) >>> **query:**  describe this images.

xxxxxxxxxxxxxx

double enter to end input (EXIT: exit chat, RESET: reset history) >>>

May 07 '24 08:05 ztfmars

xtuner chat is a simple command-line tool developed for analyzing training results.

If you want to chat with multi images, you can take advantage of inference tools such as ollama and lmdeploy.

https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf#chat-by-ollama https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf#chat-by-lmdeploy

May 07 '24 11:05 pppppM

@ztfmars hi ,when I do double enter after input: xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test.jpg.the following error occurs: double enter to end input (EXIT: exit chat, RESET: reset history) >>> what is this photo about?

Traceback (most recent call last): File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/chat.py", line 491, in main() File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/chat.py", line 469, in main generate_output = llm.generate( File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate result = self._sample( File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2390, in _sample model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs) File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1321, in _get_initial_cache_position past_length = model_kwargs["past_key_values"][0][0].shape[2] TypeError: 'NoneType' object is not subscriptable

have you met ths same problem?

May 28 '24 07:05 J0eky

xtuner xtuner copied to clipboard

multi image inputs supports for xtuner chat in llava-llama3?

xtuner
xtuner copied to clipboard