LLaVA-NeXT
LLaVA-NeXT copied to clipboard
How to reuse past_key_values
I am encountering an error when attempting to reuse the past_key_values for generating text based on image-text pairs.
pretrained = "lmms-lab/llama3-llava-next-8b"
model_name = "llava_llama3"
tokenizer, model, image_processor, max_length = load_pretrained_model(
pretrained,
None, model_name,
device_map=device_map,
attn_implementation=None,
)
Initial call to generate and obtain past_key_values:
generated = model.generate(
input_ids,
images=image_tensor,
image_sizes=image_sizes,
do_sample=False,
temperature=0,
max_new_tokens=1,
return_dict_in_generate=True,
)
Attempt to call the model again using the obtained past_key_values:
generated = model.generate(
input_ids,
images=image_tensor,
image_sizes=image_sizes,
do_sample=False,
temperature=0,
use_cache=True,
past_key_values=generated["past_key_values"],
max_new_tokens=256,
return_dict_in_generate=True,
)
An error is thrown during the second call: File "/transformers/models/llama/modeling_llama.py", line 206, in apply_rotary_pos_emb q_embed = (q * cos) + (rotate_half(q) * sin) RuntimeError: The size of tensor a (0) must match the size of tensor b (2200) at non-singleton dimension 2
Any insights or guidance on how to properly reuse the past_key_values in this context would be greatly appreciated.