lmdeploy Why did the model not understand the previous conversations?

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.

Describe the bug

Why did the model not understand the previous conversations? I gave OpenGVLab/InternVL-Chat-V1-5 some previous conversations with some wrong answers and when I asked "Do you think that in all the previous conversations we had, your answers were correct?" It generated the wrong response and said that No previous conversations were given.

What's the problem?

Reproduction

Message: [{'role': 'system', 'content': 'Your name is Alan'}, {'role': 'user', 'content': [{'type': 'text', 'text': 'What is this place?'}, {'type': 'image_url', 'image_url': {'url': 'https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg'}}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': "It's London"}]}, {'role': 'user', 'content': [{'type': 'text', 'text': 'What is this place?'}, {'type': 'image_url', 'image_url': {'url': 'https://cdn.britannica.com/68/170868-050-8DDE8263/Golden-Gate-Bridge-San-Francisco.jpg'}}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': "It's Delhi"}]}, {'role': 'user', 'content': [{'type': 'text', 'text': 'Do you think that in all the previous conversations we had, your answers were correct? If not, where were these images taken?'}, {'type': 'image_url', 'image_url': {'url': 'https://cdn.britannica.com/59/94459-050-DBA42467/Skyline-Chicago.jpg'}}]}]

Code:

class Model:
    @modal.enter()
    def start_engine(self):
        import torch
        from lmdeploy import serve, ChatTemplateConfig
        print(MODEL_NAME)
        self.server = serve("OpenGVLab/InternVL-Chat-V1-5",
                            chat_template_config=ChatTemplateConfig(
                                model_name='internvl-internlm2'),
                            server_name='0.0.0.0',
                            server_port=23333)

    @modal.method()
    async def generate(self, messages):
        from lmdeploy import client
        handle = client(api_server_url='http://0.0.0.0:23333')
        model_name = handle.available_models[0]
        print(model_name)
        outputs = handle.chat_completions_v1(
            model=model_name, messages=messages)
        print(outputs)   
        for out in outputs:
            return out

I'm building the package from GitHub

May 07 '24 06:05 Iven2132

They have a demo code (using InternVL-Chat) about multi-round multi-image conversation, but the demo only has images at inputs in the first round conversation.

# multi-round multi-image conversation
pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

question = " Describe the two pictures in detail" # Describe the two pictures in detail
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)

I don't know whether internvl-chat supports interleaved text-and-image chat, so I write an issue asking this. You can wait their responses.

When I compared the preprocess code with Internvl-demo, I found a bug in Lmdeploy, this line should be return f'<img>{IMAGE_TOKEN * num_images}</img>\n' + prompt

May 08 '24 06:05 irexyc

They have a demo code (using InternVL-Chat) about multi-round multi-image conversation, but the demo only has images at inputs in the first round conversation.
# multi-round multi-image conversation
pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

question = " Describe the two pictures in detail" # Describe the two pictures in detail
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)
I don't know whether internvl-chat supports interleaved text-and-image chat, so I write an issue asking this. You can wait their responses.

When I compared the preprocess code with Internvl-demo, I found a bug in Lmdeploy, this line should be return f'<img>{IMAGE_TOKEN * num_images}</img>\n' + prompt

Yeah, I'm waiting, just another thing I forgot to tell you that the system prompt is not also not working.

May 08 '24 14:05 Iven2132

Hi @irexyc Do lmdeploy support for InternVL-Chat-V1-5? I don't know why it'sn't even understand the system message, I have a system message "Your name is Jerry" and ask "whats your name" is says my name is AI

May 09 '24 16:05 Iven2132

Have you tried this case with the huggingface transformers to perform the inference?

May 13 '24 11:05 lvhan028

Have you tried this case with the huggingface transformers to perform the inference?

@lvhan028 Yes, it works, can you please try it? i think its an issue

May 14 '24 06:05 Iven2132

The prompt looks fine with system in messages.

# server
lmdeploy serve api_server /nvme/shared/InternVL-Chat-V1-5 --log-level INFO
# server log
# 2024-05-15 07:54:24,168 - lmdeploy - INFO - prompt="<|im_start|>system\nYour name is Alan<|im_end|>\n<|im_start|>user\nwhat's your name?<|im_end|>\n<|im_start|>assistant\n", gen_config=EngineGenerationConfig(n=1, max_new_tokens=32744, top_p=1.0, top_k=40, temperature=0.7, repetition_penalty=1.0, ignore_eos=False, random_seed=4579452733963401247, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 7910, 963, 505, 25716, 92542, 364, 92543, 1008, 364, 12706, 725, 829, 963, 345, 92542, 364, 92543, 525, 11353, 364], adapter_name=None.


# client
from lmdeploy import client
handle = client(api_server_url='http://0.0.0.0:23333')
model_name = handle.available_models[0]
messages = [
    {'role': 'system', 'content': 'Your name is Alan'}, 
    {'role': 'user', 'content': [
        {'type': 'text', 'text': "what's your name?"}
    ]}
]
outputs = handle.chat_completions_v1(
    model=model_name, messages=messages)
for out in outputs:
    print(out)

# client result
# {'id': '1', 'object': 'chat.completion', 'created': 1715759664, 'model': 'internvl-internlm2', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'My name is Alan.'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 24, 'total_tokens': 30, 'completion_tokens': 6}}

Could you provide the code and results that using transformers ?

May 15 '24 08:05 irexyc