phidata Vision Model Hallucinates : Gpt4o, Llava.

Vision Model Hallucinates : Gpt4o, Llava.

Open MANISH007700 opened this issue 3 months ago • 7 comments

Been using multiple vision model, gpt-4o, llava and other model. None of them are performing as expected.

Image used - IMAGE

Code

terminal --
ollama pull llava

editor --
from phi.agent import Agent
from phi.model.ollama import Ollama

image_agent = Agent(
    name="Image Understanding Agent",
    role="Analyze and understand images",
    agent_id="image-agent",
    model=Ollama(id="llava"),  # Using llava model which supports image analysis
    markdown=True,
)

image_agent.print_response(
    [
        {"type": "text", "text": "What's in this image, describe in 1 sentence"},
        {
            "type": "image_url",
            "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
        },
    ]
)

Output Response

┃ To analyze an image, I would first observe the overall composition and elements present in the photo. This includes examining the subject matter, lighting, colors,  ┃
┃ textures, shapes, and patterns.                                                                                                                                      ┃
┃                                                                                                                                                                      ┃
┃ Next, I would consider any potential distractions or objects that may take away from the main focus of the photo. For example, if there is a busy background, it     ┃
┃ might be beneficial to crop or adjust the image to draw more attention to the subject.                                                                               ┃
┃                                                                                                                                                                      ┃
┃ After this initial assessment, I would look for any potential opportunities to enhance the image further. This could include adjusting the exposure, contrast, and   ┃
┃ saturation to make the photo pop more or using filters or editing techniques to add a unique touch.                                                                  ┃
┃                                                                                                                                                                      ┃
┃ Overall, analyzing an image involves paying attention to all aspects of the composition, subject matter, and technical elements in order to create a visually        ┃
┃ appealing and impactful photograph.

Same issue was also found here - Issue-1348

Please have a look @manthanguptaa @ysolanky Thanks

Nov 22 '24 18:11 MANISH007700

phidata phidata copied to clipboard

Vision Model Hallucinates : Gpt4o, Llava.

phidata
phidata copied to clipboard