ollama icon indicating copy to clipboard operation
ollama copied to clipboard

Llama3.2-vision Run Error

Open mruckman1 opened this issue 1 year ago • 5 comments

What is the issue?

  1. Updated Ollama this morning.
  2. Entered ollama run x/llama3.2-vision on macbook
  3. Got below output:

pulling manifest pulling 652e85aa1e14... 100% ▕████████████████▏ 6.0 GB
pulling 622429e8d318... 100% ▕████████████████▏ 1.9 GB
pulling 962e0f69a367... 100% ▕████████████████▏ 163 B
pulling dc49c86b8ebb... 100% ▕████████████████▏ 30 B
pulling 6a50468ba2a8... 100% ▕████████████████▏ 498 B
verifying sha256 digest writing manifest success > Error: llama runner process has terminated: error:Missing required key: clip.has_text_encoder

Expected: Ollama download without error.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.3.14

mruckman1 avatar Oct 21 '24 16:10 mruckman1

Vision support was merged recently (https://github.com/ollama/ollama/pull/6963), 0.3.14 doesn't include it.

rick-github avatar Oct 21 '24 17:10 rick-github

What does "vision support" mean? Does it enabling "submitting multiple images for inference" or "video inference"? Or is it just the support for this particular model?

AFAIK, video or multiple images are still an open issue #3184

silasalves avatar Oct 21 '24 20:10 silasalves

Vision support for llama3.2. llama3.2 doesn't do video, and doesn't work reliably with multiple images.

rick-github avatar Oct 21 '24 20:10 rick-github

Does this mean that llama3.2-vision can't be used in the current version of Ollama?

I'm also getting the same error when attempting to run the model

pavan-otthi123 avatar Oct 22 '24 04:10 pavan-otthi123

Version 0.4.0 will support llama3.2-vision.

rick-github avatar Oct 22 '24 07:10 rick-github

Thank you for the hard work, could we also this change to Llama.cpp repo as well? How can we convert the model from HF to GGUF with llama vision structure?

Animaxx avatar Oct 22 '24 16:10 Animaxx

@rick-github thanks for the clarification! Also, any plans for making it run on the GPU? Llama3.2 runs on my GPU (GTX1660Ti), but llama3.2-vision runs on CPU only.

silasalves avatar Oct 22 '24 17:10 silasalves

@rick-github thanks for the clarification! Also, any plans for making it run on the GPU? Llama3.2 runs on my GPU (GTX1660Ti), but llama3.2-vision runs on CPU only.

It can run on the GPU but it needs more RAM than the text-only versions, so it has likely exceed the limit of your GPU.

jessegross avatar Oct 22 '24 17:10 jessegross

It should run on GPU if it fits:

$ ollama ps
NAME                            ID              SIZE    PROCESSOR       UNTIL   
x/llama3.2-vision:latest        25e973636a29    11 GB   100% GPU        Forever

If you can provide server logs perhaps we can see why it's not working for you.

rick-github avatar Oct 22 '24 17:10 rick-github

@jessegross Thanks for pointing that out. That sounds correct, my GPU is quite old and has only 4GB RAM.

@rick-github Thanks for the support, this is my server.log https://gist.github.com/silasalves/f2bdfc195618f19ecd557b945cab32b9

I think this is the important part?

time=2024-10-22T14:22:10.644-04:00 level=INFO source=llama-server.go:72 msg="system memory" total="31.9 GiB" free="13.6 GiB" free_swap="19.0 GiB"
time=2024-10-22T14:22:10.649-04:00 level=INFO source=memory.go:346 msg="offload to cuda" projector.weights="1.8 GiB" projector.graph="2.8 GiB" layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[4.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.9 GiB" memory.required.partial="0 B" memory.required.kv="320.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="5.2 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="213.3 MiB" memory.graph.partial="213.3 MiB"

silasalves avatar Oct 22 '24 18:10 silasalves

Yep, too big for your card.

rick-github avatar Oct 22 '24 18:10 rick-github

@Animaxx unfortunately backporting it to work with llama.cpp would be tricky because the image preparsing step is written in golang, and not c++.

I'm going to go ahead and close the issue since things are working as expected. You just need to use the pre-release to make it work.

pdevine avatar Oct 23 '24 01:10 pdevine

i've read that ollama 0.4 should support vision tasks. but also i understood that 0.3.14 should be able to load the x/llama-vision model. Is that correct?

if it's correct i am getting the same error as mentioned above, on a 90GByte M2 Macbook using 0.3.14: Error: llama runner process has terminated: error:Missing required key: clip.has_text_encoder

ludos1978 avatar Oct 25 '24 07:10 ludos1978

0.3.14 cannot load x/llama3.2-vision.

rick-github avatar Oct 25 '24 14:10 rick-github

@pdevine Is it possible to use REST API like this on the latest?

curl -X POST http://127.0.0.1:11434/api/chat \
-H "Content-Type: application/json" \
-d '{ "model": "x/llama3.2-vision", 
 "message": [
     {"role": "user", 
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
     }
] }'

eulercat avatar Oct 26 '24 01:10 eulercat

@eulercat we don't support pulling images w/ image_url. You'll have to base64 encode your image, so it looks like:

curl http://localhost:11434/api/chat -d '{
  "model": "x/llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
    }
  ]
}'

You can find out more information here

pdevine avatar Oct 28 '24 22:10 pdevine

@ludos1978 you'll need 0.4.0 for it to work. Unfortunately we're still working through some issues w/ the release candidates.

pdevine avatar Oct 28 '24 22:10 pdevine

If the image is large, it will exceed the maximum argument length of the shell.

(echo '{
         "model":"x/llama3.2-vision",
         "messages":[
           { "role":"user",
             "content":"describe this image",
             "images":["' ;
               curl -s https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg | base64 -w0 ; echo '"
             ]
           }
         ],
         "stream":false
       }') | curl -s localhost:11434/api/chat -d @- | jq
{
  "model": "x/llama3.2-vision",
  "created_at": "2024-10-28T23:14:35.376161501Z",
  "message": {
    "role": "assistant",
    "content": "The image depicts a serene and peaceful scene, with a wooden boardwalk winding its way through a lush grassy field. The boardwalk is made of light-colored wood and features a simple design, with no visible railings or obstacles to obstruct the view.\n\nAs the boardwalk stretches out into the distance, it disappears from sight, inviting the viewer to imagine where it might lead. The surrounding grass is tall and green, swaying gently in the breeze, while trees dot the horizon, adding depth and texture to the landscape.\n\nAbove, a brilliant blue sky with white clouds provides a stunning backdrop, casting dappled shadows across the boardwalk and creating a sense of warmth and tranquility. Overall, the image exudes a sense of calmness and serenity, inviting the viewer to step into its peaceful world."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 3744887728,
  "load_duration": 34980268,
  "prompt_eval_count": 13,
  "prompt_eval_duration": 45000000,
  "eval_count": 164,
  "eval_duration": 3302000000
}

rick-github avatar Oct 28 '24 23:10 rick-github

@Animaxx unfortunately backporting it to work with llama.cpp would be tricky because the image preparsing step is written in golang, and not c++.

I'm going to go ahead and close the issue since things are working as expected. You just need to use the pre-release to make it work.

But with some effort, I believe it will be possible to use their Golang binding to c++ they did it with whisper.cpp https://github.com/ggerganov/whisper.cpp/tree/master/bindings/go

To our surprise, it's calling the same libraries as those used in llama.cpp, the core to do the tensor computations, the lib GGML written in cpp.

jhowilbur avatar Nov 02 '24 02:11 jhowilbur

I am getting the same error on a M3 Macbook with 64gb, with Ollama 0.4.0-rc8.

delenius avatar Nov 05 '24 16:11 delenius

Server logs will help in debugging.

$ curl localhost:11434/api/version
{"version":"0.4.0-rc8"}
$ (echo '{
         "model":"x/llama3.2-vision",
         "messages":[
           { "role":"user",
             "content":"describe this image",
             "images":["' ;
               curl -s https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg | base64 -w0 ; echo '"
             ]
           }
         ],
         "stream":false
       }') | curl -s localhost:11434/api/chat -d @- | jq
{
  "model": "x/llama3.2-vision",
  "created_at": "2024-11-05T16:15:16.856668179Z",
  "message": {
    "role": "assistant",
    "content": "The image depicts a serene and peaceful scene, with a wooden boardwalk winding its way through a lush grassy field. The purpose of the image is to showcase the beauty of nature and the tranquility that can be found in such settings.\n\n* A wooden boardwalk:\n\t+ Winding its way through a grassy field\n\t+ Made of light-colored wood planks\n\t+ Surrounded by tall blades of grass on either side\n* Tall grass:\n\t+ Swaying gently in the breeze\n\t+ Varying shades of green, from light to dark\n\t+ Creating a sense of depth and texture in the image\n* Trees in the background:\n\t+ Scattered throughout the field\n\t+ Providing shade and shelter for wildlife\n\t+ Adding to the overall sense of serenity and calmness\n\nThe image effectively captures the beauty and tranquility of nature, inviting the viewer to step into the peaceful atmosphere. The use of natural colors and textures adds to the sense of realism, making the scene feel more immersive and engaging."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 79628322199,
  "load_duration": 70623694007,
  "prompt_eval_count": 14,
  "prompt_eval_duration": 2349000000,
  "eval_count": 212,
  "eval_duration": 6235000000
}

rick-github avatar Nov 05 '24 16:11 rick-github