OVModelForVisionCausalLM

Open eaidova opened this issue 5 months ago • 2 comments

What does this PR do?

Enables conversion and inference for multimodality llm like llava, llava-next, falcon-vl Example of usage:

from PIL import Image
import requests
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor

model_id = "llava-hf/llava-v1.6-mistral-7b-hf"
model = OVModelForVisualCausalLM.from_pretrained(model_id)
image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
processor = AutoProcessor.from_pretrained(model_id)

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What are these?"},
            {"type": "image"},
        ],
    },
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(images=raw_image, text=prompt, return_tensors="pt")

output = model.generate(**inputs, max_new_tokens=20, do_sample=False)
print(processor.decode(output[0], skip_special_tokens=True))

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

Aug 29 '24 05:08 eaidova

optimum-intel optimum-intel copied to clipboard

OVModelForVisionCausalLM

What does this PR do?

Before submitting

optimum-intel
optimum-intel copied to clipboard