Ovis Ovis2.5-2B Inference speed

My goal is to get an image description as quickly as possible. How can I speed up the inference? Did I miss anything or add extra parameters? I'm getting an inference time of 4.5 seconds per frame, which is way too long. I'm using RunPod with an RTX 4090 (24GB).

"""
Minimal example for Ovis2.5-2B - Model loading and inference
"""

import torch
from transformers import AutoModelForCausalLM
from PIL import Image

# ===== MODEL LOADING =====
def load_model():
    model = AutoModelForCausalLM.from_pretrained(
        "AIDC-AI/Ovis2.5-2B",
        torch_dtype=torch.bfloat16,
        trust_remote_code=True
    ).cuda()
    model.eval()
    return model

# ===== INFERENCE =====
def run_inference(model, image_path, prompt="Describe this image"):
    # Load image
    image = Image.open(image_path).convert("RGB")
    
    # Prepare messages according to Ovis format
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": prompt},
        ],
    }]
    
    # Preprocessing
    input_ids, pixel_values, grid_thws = model.preprocess_inputs(
        messages=messages,
        add_generation_prompt=True,
        max_pixels=896*896,
        enable_thinking=False
    )
    
    # Move to GPU
    input_ids = input_ids.cuda()
    pixel_values = pixel_values.cuda() if pixel_values is not None else None
    grid_thws = grid_thws.cuda() if grid_thws is not None else None
    
    # Generation
    with torch.no_grad():
        outputs = model.generate(
            inputs=input_ids,
            pixel_values=pixel_values,
            grid_thws=grid_thws,
            max_new_tokens=256,
            enable_thinking=False
        )
    
    # Decode response
    response = model.text_tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# ===== USAGE =====
if __name__ == "__main__":
    # Load model
    model = load_model()
    
    # Run inference
    result = run_inference(model, "your_image.jpg", "Describe this image")
    print(result)

Aug 18 '25 12:08 AlexisMDP

You can try using vLLM for inference acceleration, see here: https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#install

Aug 20 '25 13:08 JumpingRain

seems like vllm has not supported ovis2.5 yet?

I got this error "Value error, Model architectures ['Ovis2_5'] are not supported for now." for the version of 0.10.1.1

Aug 24 '25 17:08 NormanWhc

I did not try with vLLM. But yes, I think it's not supported yet. But it will be soon: https://github.com/vllm-project/vllm/pull/23084

Aug 24 '25 19:08 AlexisMDP

seems like vllm has not supported ovis2.5 yet?

I got this error "Value error, Model architectures ['Ovis2_5'] are not supported for now." for the version of 0.10.1.1

You can build vllm from source to support

Aug 30 '25 11:08 Magmanat