llama3 icon indicating copy to clipboard operation
llama3 copied to clipboard

Batch Inference with Llama 3.2 Generate Function: Only the First Result is Correct

Open smile-struggler opened this issue 1 year ago • 3 comments

Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and existing/past issues

Describe the bug

When using multiple samples for inference, each sample is identical. I'm following the official example, where each sample consists of an image and a question, and I've set do_sample=False. However, only the answer to the first question is correct, while the answers to the other questions are meaningless and identical.

In the case of single-modal (text-only) batch inference, everything works as expected.

Please let me know how to solve this issue. Thank you very much!

Minimal reproducible example

import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "/checkpoint/Llama-3.2-11B-Vision-Instruct"

model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id,padding_side='left')

url = "llama3.2/rabbit.jpg"
image = Image.open(url)
image = image.resize((560, 560))

messages = [
    {
        "role": "user", 
        "content": [
            {
                "type": "image",
            },
            {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
        ]
    }
]

texts = [
    processor.apply_chat_template(messages, add_generation_prompt=True)
    for _ in range(10)
]

images = [image for _ in range(10)]

inputs = processor(images, texts,return_tensors="pt",padding=True).to(model.device)

output = model.generate(**inputs, max_new_tokens=100, do_sample=False)
prompt_len = inputs.input_ids.shape[-1]
generated_ids = output[:, prompt_len:]
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)

print(generated_text)

Output

["Here is a haiku for the image:\n\nA rabbit in a blue coat\nStands on a dirt path, so sweet\nSpringtime's gentle delight", 'I\'d love to see it! Go ahead and share your haiku about the beloved rabbit from the classic children\'s tale.\n\n(And if you\'d like, I can try writing one too, based on my understanding of the character and the story of "The Tale of Peter Rabbit" by\xa0—\xa0or\xa0—\xa0…)\n\n(Also, I\'ll be sure to keep my response in mind: haikus are traditionally short, so I\'ll keep it brief and sweet!)', 'I\'d love to see it! Go ahead and share your haiku about the beloved rabbit from the classic children\'s tale.\n\n(And if you\'d like, I can try writing one too, based on my understanding of the character and the story of "The Tale of Peter Rabbit" by\xa0—\xa0or\xa0—\xa0…)\n\n(Also, I\'ll be sure to keep my response in mind: haikus are traditionally short, so I\'ll keep it brief and sweet!)', 'I\'d love to see it! Go ahead and share your haiku about the beloved rabbit from the classic children\'s tale.\n\n(And if you\'d like, I can try writing one too, based on my understanding of the character and the story of "The Tale of Peter Rabbit" by\xa0—\xa0or\xa0—\xa0…)\n\n(Also, I\'ll be sure to keep my response in mind: haikus are traditionally short, so I\'ll keep it brief and sweet!)', 'I\'d love to see it! Go ahead and share your haiku about the beloved rabbit from the classic children\'s tale.\n\n(And if you\'d like, I can try writing one too, based on my understanding of the character and the story of "The Tale of Peter Rabbit" by\xa0—\xa0or\xa0—\xa0…)\n\n(Also, I\'ll be sure to keep my response in mind: haikus are traditionally short, so I\'ll keep it brief and sweet!)', 'I\'d love to see it! Go ahead and share your haiku about the beloved rabbit from the classic children\'s tale.\n\n(And if you\'d like, I can try writing one too, based on my understanding of the character and the story of "The Tale of Peter Rabbit" by\xa0—\xa0or\xa0—\xa0…)\n\n(Also, I\'ll be sure to keep my response in mind: haikus are traditionally short, so I\'ll keep it brief and sweet!)', 'I\'d love to see it! Go ahead and share your haiku about the beloved rabbit from the classic children\'s tale.\n\n(And if you\'d like, I can try writing one too, based on my understanding of the character and the story of "The Tale of Peter Rabbit" by\xa0—\xa0or\xa0—\xa0…)\n\n(Also, I\'ll be sure to keep my response in mind: haikus are traditionally short, so I\'ll keep it brief and sweet!)', 'I\'d love to see it! Go ahead and share your haiku about the beloved rabbit from the classic children\'s tale.\n\n(And if you\'d like, I can try writing one too, based on my understanding of the character and the story of "The Tale of Peter Rabbit" by\xa0—\xa0or\xa0—\xa0…)\n\n(Also, I\'ll be sure to keep my response in mind: haikus are traditionally short, so I\'ll keep it brief and sweet!)', 'I\'d love to see it! Go ahead and share your haiku about the beloved rabbit from the classic children\'s tale.\n\n(And if you\'d like, I can try writing one too, based on my understanding of the character and the story of "The Tale of Peter Rabbit" by\xa0—\xa0or\xa0—\xa0…)\n\n(Also, I\'ll be sure to keep my response in mind: haikus are traditionally short, so I\'ll keep it brief and sweet!)', 'I\'d love to see it! Go ahead and share your haiku about the beloved rabbit from the classic children\'s tale.\n\n(And if you\'d like, I can try writing one too, based on my understanding of the character and the story of "The Tale of Peter Rabbit" by\xa0—\xa0or\xa0—\xa0…)\n\n(Also, I\'ll be sure to keep my response in mind: haikus are traditionally short, so I\'ll keep it brief and sweet!)']

Runtime Environment

  • Model: Llama-3.2-11B-Vision-Instruct
  • Using via huggingface?: yes
  • OS: Linux
  • GPU VRAM: 81920MB
  • Number of GPUs: 1
  • GPU Make: Nvidia

smile-struggler avatar Dec 15 '24 00:12 smile-struggler

I also need batch inference. Any update on this?

Pedrexus avatar Dec 20 '24 14:12 Pedrexus

I need it too but it seems that it is not recommended for the moment, and still a WIP from my understanding .. Some similar issues :

pjmalandrino avatar Jan 19 '25 20:01 pjmalandrino

same issue. output many special tokens like <|finetune_right_pad_id|> , <|start_header_id|>

JiazunChen avatar Apr 13 '25 10:04 JiazunChen