Qwen-VL
Qwen-VL copied to clipboard
Batch inference
Hi -- has anyone had success with batch inference for Qwen-VL? Other related issues in this repo didn't end up working for me (e.g. https://github.com/QwenLM/Qwen-VL/issues/51), as well as the documented usage from the transformers repo (https://github.com/huggingface/transformers/issues/26061) due to API differences with the generate() method. I also had no luck in setting the eos_token
for the tokenizer or adding a custom token.
The closest I've got is the following, which is mostly from https://github.com/huggingface/transformers/issues/26061#issuecomment-1771132768 and attempts a batch of 2 for the same query:
# model and tokenizer defined above per repo README
query = tokenizer.from_list_format([
{'image': 'im1.jpg'},
{'text': 'describe the image'}
])
tokenizer.padding_side = 'left'
inputs = tokenizer([query, query], return_tensors='pt').to(model.device)
with torch.no_grad():
generated_ids = model.generate(input_ids=inputs.input_ids)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
generated_texts
However generated_texts
ends up pretty broken looking compared to the single use model.chat(tokenizer, query=query)
call.