openvino_notebooks Issue WIth Running Phi3 vision on NPU

I installed the NPU drivers as per the instructions, created the OpenVINO version of Phi 3 Vision model, set the device to NPU, and attempted to run the model. Initially, in Task Manager, I observed the model utilizing the NPU, reaching 100% usage. However, it immediately threw the following error:

RuntimeError: Check 'prompt_ids.get_size() >= tokenized_history.size()' failed at C:\Jenkins\workspace\private-ci\ie\build-windows-vs2022\b\repos\openvino.genai\src\cpp\src\visual_language\pipeline.cpp:201: Prompt IDs size is less than tokenized history size.

Expected behavior Run Phi3 vision on NPU

Laptop Specs Processor :Intel(R) Core(TM) Ultra 7 165U 2.10 GHz RAM : 16GB NPU : Intel(R)AI Boost

Screenshots

Apr 02 '25 08:04 Harsha0056

EDIT: OK, I see, looks like you are using "HWINfo64". May I ask which tool you use to visualize the NPU load, as shown in your screenshot?

Do you use this notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/phi-3-vision/phi-3-vision.ipynb ?

Which of the two models have you selected in the drop-down menu? "microsoft/Phi-3.5-vision-instruct", "microsoft/Phi-3-vision-128k-instruct",

Have you enabled the checkbox "Compress model" to apply model compression?

Which version of the NPU driver do you have installed? Have you rebooted your system afterwards? Which version of this notebooks repository do you use? Have you created a new Python virtual environment?

I just cloned the current version of this repo, created a new virtual environment and started the Jupyter notebook - downloading and compression will take while; using the first model from dropdown "microsoft/Phi-3.5-vision-instruct" (default value) and kept the checkbox for model-compression (default value).

Keep monitoring the system memory consumption while compression and conversion. Are you sure it finished successfully? (my system has 64GB system memory and memory usage almost uses the max memory)

Apr 02 '25 09:04 brmarkus

EDIT: OK, I see, looks like you are using "HWINfo64". May I ask which tool you use to visualize the NPU load, as shown in your screenshot?

Do you use this notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/phi-3-vision/phi-3-vision.ipynb ?

Which of the two models have you selected in the drop-down menu? "microsoft/Phi-3.5-vision-instruct", "microsoft/Phi-3-vision-128k-instruct",

Have you enabled the checkbox "Compress model" to apply model compression?

Which version of the NPU driver do you have installed? Have you rebooted your system afterwards? Which version of this notebooks repository do you use? Have you created a new Python virtual environment?

I just cloned the current version of this repo, created a new virtual environment and started the Jupyter notebook - downloading and compression will take while; using the first model from dropdown "microsoft/Phi-3.5-vision-instruct" (default value) and kept the checkbox for model-compression (default value).

Keep monitoring the system memory consumption while compression and conversion. Are you sure it finished successfully? (my system has 64GB system memory and memory usage almost uses the max memory)

Hi @brmarkus

Yes, I was using HWInfo.
I was using Phi-3 Vision 128K from the same notebook
I selected the checkbox for weight compression.
I installed the NPU driver version 32.0.100.3714 about a month ago. Since it worked for the Phi-3 LLM on NPU, I didn't update to the newer drivers.
I created a new Python environment, cloned the notebooks again, and also deleted the existing Phi-3 Vision model from the cache.
The compression was successful, as I tested it on both the CPU and iGPU.
I encountered this issue when setting the device to NPU.

Apr 02 '25 10:04 Harsha0056

Ok, thank you. I started with "microsoft/Phi-3.5-vision-instruct" - tested CPU and GPU successfully. But when using NPU I get another exception, when doing "result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)":

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[23], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Exception from src\plugins\intel_npu\src\plugin\npuw\just_sync_infer_request.cpp:659:
Failed to compile. No more devices are left!

Let me now try "microsoft/Phi-3-vision-128k-instruct" as well - will take a while. UPDATE: Now also downloaded, compressed and converted model "microsoft/Phi-3-vision-128k-instruct". Inference works with CPU and GPU.

However, when using NPU, in the step "pipe = ov_genai.VLMPipeline(model_dir, device.value)" I now get an exception:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 4
      1 import openvino_genai as ov_genai
----> 4 pipe = ov_genai.VLMPipeline(model_dir, device.value)

RuntimeError: Exception from src\inference\src\cpp\core.cpp:129:
Exception from src\inference\src\dev\plugin.cpp:58:
Check 'unregistered_parameters.str().empty()' failed at src\core\src\model.cpp:60:
Model references undeclared parameters: opset1::Parameter past_key_values.9.value () -> (f16[1,32,96,1151])

This is also different than your shown exception.

My environment:

MS-Win-11-Pro
Intel Core Ultra 7 155H
NPU driver version 32.0.100.3714 (17.01.2025); I haven't checked whether there is a newer version agailable...
64GB system memory
Python v3.12.4

Someone from OpenVINO-Notebook team need to have a closer look.

UPDATE: After testing CPU and GPU successfully, now I had shut-down the Jupyter server and started again, ran all cells again (manually), selecting "Phi-3-vision-128k-instruct" - and were able to run until the last step. Then I get the same exception like you:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[11], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Check '*roi_begin <= *max_dim' failed at src\inference\src\dev\make_tensor.cpp:33

Apr 02 '25 10:04 brmarkus

Ok, thank you. I started with "microsoft/Phi-3.5-vision-instruct" - tested CPU and GPU successfully. But when using NPU I get another exception, when doing "result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)":
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[23], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Exception from src\plugins\intel_npu\src\plugin\npuw\just_sync_infer_request.cpp:659:
Failed to compile. No more devices are left!
Let me now try "microsoft/Phi-3-vision-128k-instruct" as well - will take a while. UPDATE: Now also downloaded, compressed and converted model "microsoft/Phi-3-vision-128k-instruct". Inference works with CPU and GPU.

However, when using NPU, in the step "pipe = ov_genai.VLMPipeline(model_dir, device.value)" I now get an exception:
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 4
      1 import openvino_genai as ov_genai
----> 4 pipe = ov_genai.VLMPipeline(model_dir, device.value)

RuntimeError: Exception from src\inference\src\cpp\core.cpp:129:
Exception from src\inference\src\dev\plugin.cpp:58:
Check 'unregistered_parameters.str().empty()' failed at src\core\src\model.cpp:60:
Model references undeclared parameters: opset1::Parameter past_key_values.9.value () -> (f16[1,32,96,1151])
This is also different than your shown exception.

My environment:
* MS-Win-11-Pro

* Intel Core Ultra 7 155H

* NPU driver version 32.0.100.3714 (17.01.2025); I haven't checked whether there is a newer version agailable...

* 64GB system memory

* Python v3.12.4
Someone from OpenVINO-Notebook team need to have a closer look.

UPDATE: After testing CPU and GPU successfully, now I had shut-down the Jupyter server and started again, ran all cells again (manually), selecting "Phi-3-vision-128k-instruct" - and were able to run until the last step. Then I get the same exception like you:
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[11], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Check '*roi_begin <= *max_dim' failed at src\inference\src\dev\make_tensor.cpp:33

I updated to the latest NPU driver with new env and cloned latest repo as well , still getting the same error.

Jun 03 '25 08:06 Harsha0056

Similar problem here. I am using the latest driver of NPU(32.0.100.4082). Ultra 5 235U. Running with genai

Exception in thread Thread-2 (generate):
Traceback (most recent call last):
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "D:\Study\Qwen3\app.py", line 67, in generate
    pipe.generate(prompt, config, streamer_callback)
RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Check '*roi_begin <= *max_dim' failed at src\inference\src\dev\make_tensor.cpp:34

Jul 17 '25 11:07 wunianqing

If anyone is facing the 'No more devices are left' issue, try using the newest NPU driver version: 32.0.100.4239 and ensuring openvino, openvino-tokenizers and openvino-genai are all version 2025.2.0. This worked for me and inference works fine with the converted Phi-3.5-mini-instruct.

Aug 28 '25 09:08 denisb-native