openvino_notebooks icon indicating copy to clipboard operation
openvino_notebooks copied to clipboard

Issue WIth Running Phi3 vision on NPU

Open Harsha0056 opened this issue 8 months ago • 4 comments

I installed the NPU drivers as per the instructions, created the OpenVINO version of Phi 3 Vision model, set the device to NPU, and attempted to run the model. Initially, in Task Manager, I observed the model utilizing the NPU, reaching 100% usage. However, it immediately threw the following error:

RuntimeError: Check 'prompt_ids.get_size() >= tokenized_history.size()' failed at C:\Jenkins\workspace\private-ci\ie\build-windows-vs2022\b\repos\openvino.genai\src\cpp\src\visual_language\pipeline.cpp:201: Prompt IDs size is less than tokenized history size.

Expected behavior Run Phi3 vision on NPU

Laptop Specs Processor :Intel(R) Core(TM) Ultra 7 165U 2.10 GHz RAM : 16GB NPU : Intel(R)AI Boost

Screenshots

Image

Image

Image

Harsha0056 avatar Apr 02 '25 08:04 Harsha0056

EDIT: OK, I see, looks like you are using "HWINfo64". May I ask which tool you use to visualize the NPU load, as shown in your screenshot?

Image

Do you use this notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/phi-3-vision/phi-3-vision.ipynb ?

Which of the two models have you selected in the drop-down menu? "microsoft/Phi-3.5-vision-instruct", "microsoft/Phi-3-vision-128k-instruct",

Have you enabled the checkbox "Compress model" to apply model compression?

Which version of the NPU driver do you have installed? Have you rebooted your system afterwards? Which version of this notebooks repository do you use? Have you created a new Python virtual environment?

I just cloned the current version of this repo, created a new virtual environment and started the Jupyter notebook - downloading and compression will take while; using the first model from dropdown "microsoft/Phi-3.5-vision-instruct" (default value) and kept the checkbox for model-compression (default value).

Keep monitoring the system memory consumption while compression and conversion. Are you sure it finished successfully? (my system has 64GB system memory and memory usage almost uses the max memory)

brmarkus avatar Apr 02 '25 09:04 brmarkus

EDIT: OK, I see, looks like you are using "HWINfo64". May I ask which tool you use to visualize the NPU load, as shown in your screenshot?

Image

Do you use this notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/phi-3-vision/phi-3-vision.ipynb ?

Which of the two models have you selected in the drop-down menu? "microsoft/Phi-3.5-vision-instruct", "microsoft/Phi-3-vision-128k-instruct",

Have you enabled the checkbox "Compress model" to apply model compression?

Which version of the NPU driver do you have installed? Have you rebooted your system afterwards? Which version of this notebooks repository do you use? Have you created a new Python virtual environment?

I just cloned the current version of this repo, created a new virtual environment and started the Jupyter notebook - downloading and compression will take while; using the first model from dropdown "microsoft/Phi-3.5-vision-instruct" (default value) and kept the checkbox for model-compression (default value).

Keep monitoring the system memory consumption while compression and conversion. Are you sure it finished successfully? (my system has 64GB system memory and memory usage almost uses the max memory)

Hi @brmarkus

  1. Yes, I was using HWInfo.
  2. I was using Phi-3 Vision 128K from the same notebook
  3. I selected the checkbox for weight compression.
  4. I installed the NPU driver version 32.0.100.3714 about a month ago. Since it worked for the Phi-3 LLM on NPU, I didn't update to the newer drivers.
  5. I created a new Python environment, cloned the notebooks again, and also deleted the existing Phi-3 Vision model from the cache.
  6. The compression was successful, as I tested it on both the CPU and iGPU.
  7. I encountered this issue when setting the device to NPU.

Harsha0056 avatar Apr 02 '25 10:04 Harsha0056

Ok, thank you. I started with "microsoft/Phi-3.5-vision-instruct" - tested CPU and GPU successfully. But when using NPU I get another exception, when doing "result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)":

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[23], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Exception from src\plugins\intel_npu\src\plugin\npuw\just_sync_infer_request.cpp:659:
Failed to compile. No more devices are left!

Let me now try "microsoft/Phi-3-vision-128k-instruct" as well - will take a while. UPDATE: Now also downloaded, compressed and converted model "microsoft/Phi-3-vision-128k-instruct". Inference works with CPU and GPU.

However, when using NPU, in the step "pipe = ov_genai.VLMPipeline(model_dir, device.value)" I now get an exception:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 4
      1 import openvino_genai as ov_genai
----> 4 pipe = ov_genai.VLMPipeline(model_dir, device.value)

RuntimeError: Exception from src\inference\src\cpp\core.cpp:129:
Exception from src\inference\src\dev\plugin.cpp:58:
Check 'unregistered_parameters.str().empty()' failed at src\core\src\model.cpp:60:
Model references undeclared parameters: opset1::Parameter past_key_values.9.value () -> (f16[1,32,96,1151])

This is also different than your shown exception.

My environment:

  • MS-Win-11-Pro
  • Intel Core Ultra 7 155H
  • NPU driver version 32.0.100.3714 (17.01.2025); I haven't checked whether there is a newer version agailable...
  • 64GB system memory
  • Python v3.12.4

Someone from OpenVINO-Notebook team need to have a closer look.

UPDATE: After testing CPU and GPU successfully, now I had shut-down the Jupyter server and started again, ran all cells again (manually), selecting "Phi-3-vision-128k-instruct" - and were able to run until the last step. Then I get the same exception like you:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[11], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Check '*roi_begin <= *max_dim' failed at src\inference\src\dev\make_tensor.cpp:33

brmarkus avatar Apr 02 '25 10:04 brmarkus

Ok, thank you. I started with "microsoft/Phi-3.5-vision-instruct" - tested CPU and GPU successfully. But when using NPU I get another exception, when doing "result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)":

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[23], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Exception from src\plugins\intel_npu\src\plugin\npuw\just_sync_infer_request.cpp:659:
Failed to compile. No more devices are left!

Let me now try "microsoft/Phi-3-vision-128k-instruct" as well - will take a while. UPDATE: Now also downloaded, compressed and converted model "microsoft/Phi-3-vision-128k-instruct". Inference works with CPU and GPU.

However, when using NPU, in the step "pipe = ov_genai.VLMPipeline(model_dir, device.value)" I now get an exception:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 4
      1 import openvino_genai as ov_genai
----> 4 pipe = ov_genai.VLMPipeline(model_dir, device.value)

RuntimeError: Exception from src\inference\src\cpp\core.cpp:129:
Exception from src\inference\src\dev\plugin.cpp:58:
Check 'unregistered_parameters.str().empty()' failed at src\core\src\model.cpp:60:
Model references undeclared parameters: opset1::Parameter past_key_values.9.value () -> (f16[1,32,96,1151])

This is also different than your shown exception.

My environment:

* MS-Win-11-Pro

* Intel Core Ultra 7 155H

* NPU driver version 32.0.100.3714 (17.01.2025); I haven't checked whether there is a newer version agailable...

* 64GB system memory

* Python v3.12.4

Someone from OpenVINO-Notebook team need to have a closer look.

UPDATE: After testing CPU and GPU successfully, now I had shut-down the Jupyter server and started again, ran all cells again (manually), selecting "Phi-3-vision-128k-instruct" - and were able to run until the last step. Then I get the same exception like you:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[11], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Check '*roi_begin <= *max_dim' failed at src\inference\src\dev\make_tensor.cpp:33

I updated to the latest NPU driver with new env and cloned latest repo as well , still getting the same error.

Image

Image

Harsha0056 avatar Jun 03 '25 08:06 Harsha0056

Similar problem here. I am using the latest driver of NPU(32.0.100.4082). Ultra 5 235U. Running with genai

Exception in thread Thread-2 (generate):
Traceback (most recent call last):
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "D:\Study\Qwen3\app.py", line 67, in generate
    pipe.generate(prompt, config, streamer_callback)
RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Check '*roi_begin <= *max_dim' failed at src\inference\src\dev\make_tensor.cpp:34

wunianqing avatar Jul 17 '25 11:07 wunianqing

If anyone is facing the 'No more devices are left' issue, try using the newest NPU driver version: 32.0.100.4239 and ensuring openvino, openvino-tokenizers and openvino-genai are all version 2025.2.0. This worked for me and inference works fine with the converted Phi-3.5-mini-instruct.

denisb-native avatar Aug 28 '25 09:08 denisb-native