phidata
phidata copied to clipboard
Vision Model Hallucinates : Gpt4o, Llava.
Been using multiple vision model, gpt-4o, llava and other model. None of them are performing as expected.
Image used - IMAGE
Code
terminal --
ollama pull llava
editor --
from phi.agent import Agent
from phi.model.ollama import Ollama
image_agent = Agent(
name="Image Understanding Agent",
role="Analyze and understand images",
agent_id="image-agent",
model=Ollama(id="llava"), # Using llava model which supports image analysis
markdown=True,
)
image_agent.print_response(
[
{"type": "text", "text": "What's in this image, describe in 1 sentence"},
{
"type": "image_url",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
]
)
Output Response
┃ To analyze an image, I would first observe the overall composition and elements present in the photo. This includes examining the subject matter, lighting, colors, ┃
┃ textures, shapes, and patterns. ┃
┃ ┃
┃ Next, I would consider any potential distractions or objects that may take away from the main focus of the photo. For example, if there is a busy background, it ┃
┃ might be beneficial to crop or adjust the image to draw more attention to the subject. ┃
┃ ┃
┃ After this initial assessment, I would look for any potential opportunities to enhance the image further. This could include adjusting the exposure, contrast, and ┃
┃ saturation to make the photo pop more or using filters or editing techniques to add a unique touch. ┃
┃ ┃
┃ Overall, analyzing an image involves paying attention to all aspects of the composition, subject matter, and technical elements in order to create a visually ┃
┃ appealing and impactful photograph.
Same issue was also found here - Issue-1348
Please have a look @manthanguptaa @ysolanky Thanks