MiniCPM-V MiniCPM-V 2.6 produces only hallucinations when doing OCR

Hey everyone, I have been trying to find the recommended settings for this model - temp, top k and top p but I cannot find it anywhere and this model is producing 100% hallucinations with every setting tried, the previous minicpm-v was much better.

Jun 24 '25 07:06 evcharger

Thanks for using，can you share your test images and test code?

Jun 30 '25 02:06 YuzaChongyi

Could you please provide further information based on our response? If there's no progress, we may close this issue in a few days.

Aug 14 '25 06:08 tc-mb

Hello @tc-mb @YuzaChongyi. I faced a similar issue when I was trying to test MiniCPM-V 2.6 (minicpm-v:8b) from the Ollama. All images I tried led to hallucinations that are not related. I've tried Python code, open webui and direct CURL call to ollama and all led to the same result. Other models (llava, gemma) returns the realistic result. Maybe this is related to Ollama configuration - I don't know, sorry.

Here is an example of the code

from ollama import chat
from PIL import Image
import base64

def image_to_base64(image_path: str) -> str:
    """Convert image to base64 string"""
    with Image.open(image_path) as img:
        if img.mode in ('RGBA', 'LA', 'P'):
            img = img.convert('RGB')
        
        from io import BytesIO
        buffered = BytesIO()
        img.save(buffered, format="JPEG")
        img_str = base64.b64encode(buffered.getvalue()).decode('utf-8')
        return img_str

response = chat(
    # model='llava:13b', <- just to compare if needed
    model='minicpm-v',
    messages=[
        {
            'role': 'user',
            'content': 'Make a short factual description of the attached image.',
            'images': [image_to_base64(str('./your_image_2.jpg'))]
        }
    ]
)

print(response['message']['content'])

This is an output that I've tried (same image, multiple invocations)

> python ollama_minimal.py
The photograph shows a close-up view of a collection of items, including various objects and containers on what appears to be a table or countertop. The colors in this photo are muted with shades of brown, green, beige, gray, white, blue, red, purple, tan, orange, dark red, olive-green, yellow-brown, pinkish-red, light blue, gold-yellow, lilac-blue, peach-pink, black, and ivory-white hues. These colors create a warm and inviting atmosphere.

> python ollama_minimal.py
This picture features an array of fruit, including apples and oranges placed on what appears to be a wooden surface or cutting board. The focus is particularly sharp in the center where one can see clear details such as individual lines on the skin of the fruits suggesting texture and depth. There's also natural lighting that enhances visibility without creating harsh shadows.

> python ollama_minimal.py
The picture shows an orange traffic cone with its cap missing, placed on what appears to be asphalt road surface. The background is blurred but seems consistent with outdoor conditions typical for street or construction work. There are no discernible features that suggest movement in this still shot. It's a clear day as indicated by the brightness and lack of shadows from any objects besides the cone itself which suggests overhead lighting, likely sunlight.

This is an image I've tried (actually I tried multiple ones, but it was always hallucinating). It's an IPhone box, so the output is definitely wrong

https://github.com/user-attachments/assets/552216b4-391a-41c2-8c3f-3fe4e65978de

I am running it on mac ARM cpu with 18gb unified memory if it's important.

Sep 05 '25 20:09 klakpin

@klakpin Ollama official wrote the wrong code, this is the problem I found and the PR that fixed it. https://github.com/ollama/ollama/pull/12168

Sep 08 '25 03:09 tc-mb

This PR has been merged, I can close this issue. If you have any questions, feel free to ask us again.

Sep 18 '25 05:09 tc-mb