transformers
transformers copied to clipboard
Fix continue_final_message for image-text-to-text chat templates
What does this PR do?
The content field for an image-text-to-text model is a list, which is not currently taken into account when continue_final_message is set to True in tokenization_utils_base.
Split from image-text-to-text PR
Reproduce error:
from transformers import LlavaProcessor, LlavaForConditionalGeneration
import torch
from PIL import Image
import requests
processor = LlavaProcessor.from_pretrained("llava-hf/llava-interleave-qwen-0.5b-hf")
model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-interleave-qwen-0.5b-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
model.to("cuda:0")
# Define a chat history and use `apply_chat_template` to get correctly formatted prompt
# Each value in "content" has to be a list of dicts with types ("text", "image")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Describe this image."},
],
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "There is a dog and"},
],
},
]
prompt = processor.apply_chat_template(messages, continue_final_message=True)
inputs = processor(text=prompt, return_tensors="pt").to("cuda:0").to(torch.float16)
# autoregressively complete prompt
output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))
@zucchini-nlp @ArthurZucker
Added one test for llava processor :). I could add one for every vlms processor that use chat template, but as they all use the same underlying apply_chat_template, I thought it was not worth the diffs. Wdyt?