Qwen-VL Batch inference

Batch inference

Open erikreed opened this issue 1 year ago • 4 comments

Hi -- has anyone had success with batch inference for Qwen-VL? Other related issues in this repo didn't end up working for me (e.g. https://github.com/QwenLM/Qwen-VL/issues/51), as well as the documented usage from the transformers repo (https://github.com/huggingface/transformers/issues/26061) due to API differences with the generate() method. I also had no luck in setting the eos_token for the tokenizer or adding a custom token.

The closest I've got is the following, which is mostly from https://github.com/huggingface/transformers/issues/26061#issuecomment-1771132768 and attempts a batch of 2 for the same query:

# model and tokenizer defined above per repo README

query = tokenizer.from_list_format([
    {'image': 'im1.jpg'},
    {'text': 'describe the image'}
])
tokenizer.padding_side = 'left'
inputs = tokenizer([query, query], return_tensors='pt').to(model.device)
with torch.no_grad():
    generated_ids = model.generate(input_ids=inputs.input_ids)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
generated_texts

However generated_texts ends up pretty broken looking compared to the single use model.chat(tokenizer, query=query) call.

Jan 15 '24 20:01 erikreed

Qwen-VL Qwen-VL copied to clipboard

Batch inference

Qwen-VL
Qwen-VL copied to clipboard