LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

How to reuse past_key_values

Open zhuohangu opened this issue 1 year ago • 0 comments

I am encountering an error when attempting to reuse the past_key_values for generating text based on image-text pairs.

pretrained = "lmms-lab/llama3-llava-next-8b"
model_name = "llava_llama3"
tokenizer, model, image_processor, max_length = load_pretrained_model(
    pretrained,
    None, model_name,
    device_map=device_map,
    attn_implementation=None,
    )

Initial call to generate and obtain past_key_values:

generated = model.generate(
            input_ids,
            images=image_tensor,
            image_sizes=image_sizes,
            do_sample=False,
            temperature=0,
            max_new_tokens=1,
            return_dict_in_generate=True,
        )

Attempt to call the model again using the obtained past_key_values:

generated = model.generate(
    input_ids,
    images=image_tensor,
    image_sizes=image_sizes,
    do_sample=False,
    temperature=0,
    use_cache=True,
    past_key_values=generated["past_key_values"],
    max_new_tokens=256,
    return_dict_in_generate=True,
)

An error is thrown during the second call: File "/transformers/models/llama/modeling_llama.py", line 206, in apply_rotary_pos_emb q_embed = (q * cos) + (rotate_half(q) * sin) RuntimeError: The size of tensor a (0) must match the size of tensor b (2200) at non-singleton dimension 2

Any insights or guidance on how to properly reuse the past_key_values in this context would be greatly appreciated.

zhuohangu avatar Aug 03 '24 07:08 zhuohangu