OSError: [WinError 126]` when importing `torch` with IPEX-LLM on Windows with Intel Arc GPU
System Information:
- OS: Windows 11
- GPU: Intel(R) Arc(TM) Graphics
- GPU Driver Version: 32.0.101.6913
Environment:
- Python: 3.11 (via Conda)
- IPEX-LLM Installation Method:
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ - PyTorch Version Installed:
torch-2.1.0a0+cxx11.abi - Visual Studio: VS 2022 with "Desktop development with C++" workload installed (including C++/CLI support components).
Description
Following the official "Install IPEX-LLM on Windows with Intel GPU" guide, the environment setup appears successful, installing the torch-2.1.0a0+cxx11.abi package as expected.
However, any attempt to import torch after this installation fails with the following error:
OSError: [WinError 126] The specified module could not be found. Error loading "C:\...\Lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.
This prevents any use of ipex-llm on the GPU.
Steps to Reproduce
- On a supported Windows 11 machine with an Intel Arc GPU and up-to-date drivers, ensure Visual Studio 2022 with the C++ workload is installed.
- Create a clean Conda environment with Python 3.11.
- In the Conda environment, run
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/. - Attempt to run any Python script that begins with
import torch. - The
OSError: [WinError 126]occurs.
Troubleshooting Performed
- Confirmed the issue is identical to the one described in
oobabooga/text-generation-webui#6253. - Programmatically added the MSVC build tools directory (
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\[version]\bin\Hostx64\x64) to the system PATH before the import, with no change in result. - Ensured all relevant Visual Studio C++ components, including C++/CLI support, are installed.
- The error seems to indicate a missing system-level DLL dependency or a problem with how the
backend_with_compiler.dllwas compiled.
This appears to be a blocking issue for using IPEX-LLM on certain Windows + Intel Arc configurations.
Hi @doublefx , could you please provide some env check info with this scripts? Besides, is it work fine just install torch==2.1.0a0?
Hi, thank you for the suggestion. I've performed the tests you requested.
1. Standalone Torch Test:
You asked if installing torch==2.1.0a0 works on its own. The package is present in my environment, but attempting to import it fails with the same error as before:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\site-packages\torch\__init__.py", line 139, in <module>
raise err
OSError: [WinError 126] The specified module could not be found. Error loading "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.
2. Environment Check:
I ran the python\llm\scripts\env-check.bat script. It failed to find xpu-smi. Here is the system information it produced:
OS Name: Microsoft Windows 11 Pro
OS Version: 10.0.26100 N/A Build 26100
System Manufacturer: Micro Computer (HK) Tech Limited
System Model: AtomMan X Series
System Type: x64-based PC
Processor(s): [01]: Intel64 Family 6 Model 170 Stepping 4 GenuineIntel ~2300 Mhz
BIOS Version: American Megatrends International, LLC. 1.00, 30/05/2024
3. Driver Version Discovery: I found that my currently installed NPU driver version is 32.0.100.4082.
Your npu_quickstart.md documentation highly recommends driver version 32.0.100.3104.
Question:
Given that the torch import fails on its own and my NPU driver is newer than the recommended version, is it likely that this driver mismatch is the root cause of the WinError 126? Would the correct next step be for me to uninstall my current driver and install the recommended 32.0.100.3104 version?
Thank you for your help.
Hi @doublefx , could you please try conda install libuv to see if it helps ? Just in case you miss this step during create your conda env.
Hi, thank you again for your help. The suggestion to install libuv was correct and it resolved the WinError 126 when importing torch.
However, we have encountered a new problem. After successfully installing libuv, I performed a clean reinstallation of ipex-llm:
pip uninstall torch ipex-llm -ypip install --pre --upgrade --no-cache-dir ipex-llm[npu]
The installation completed, but it installed the CPU-only version of PyTorch (torch-2.1.2+cpu).
When I run the env-check.bat script, it now shows IPEX is not installed properly and xpu-smi is not recognized, which makes sense if we only have the CPU version of the libraries.
This seems to confirm that the pip installer is not able to find the correct NPU-enabled version of torch for my system.
To summarize:
- The base
torchimport error is fixed. pip install ipex-llm[npu]now installs successfully, but only fetches the CPU packages.- My NPU driver is 32.0.100.4082.
- The documentation recommends 32.0.100.3104.
Is there a specific --index-url I should be using? Or does this confirm that the driver version mismatch is preventing pip from finding the correct NPU packages?
Hi,
Following up on our previous conversation, we've made significant progress based on your advice but have unfortunately hit a final wall on the Windows platform.
High-Level Summary: The good news is that we have resolved all Python errors. The test script now runs successfully and generates correct text output. The bad news is that the NPU utilization remains at 0%, even after we aligned the entire environment (packages and drivers) to the official recommendations.
Key Steps and Findings:
-
Installation Success: Your advice was crucial. We discovered the
--extra-index-urland[xpu]flag in the documentation, which allowed us to install the correct, XPU-enabled versions oftorch(2.1.0a0+cxx11.abi) andipex(2.1.10+xpu). We also havelibuvinstalled in the conda environment.torch.xpu.is_available()now correctly returnsTrue. -
Driver Downgrade: After confirming the software was installed correctly, we observed that the NPU utilization was still 0% with the system's default driver (
32.0.100.4082). Following the documentation's strong recommendation, we have now successfully downgraded the NPU driver to the specified32.0.100.3104and rebooted the system. -
Final Result: Even with the recommended driver (
32.0.100.3104) and the correct libraries, the result is the same. The script runs and produces the expected text, but the NPU utilization graph in Task Manager remains at 0%.
Conclusion: We believe we have uncovered a deeper bug. We have meticulously followed every step, and the environment should now be perfectly configured, yet the NPU hardware is not being engaged for computation. The process appears to be falling back to the GPU.
For your reference, here is the exact script we are using for testing:
import os
os.environ['IPEX_LLM_NPU_MTL'] = '1'
import torch
from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# --- Verify Correct Installation ---
print(f"PyTorch version: {torch.__version__}")
xpu_available = hasattr(torch, 'xpu') and torch.xpu.is_available()
print(f"Is XPU available? {xpu_available}")
if not xpu_available:
print("Warning: XPU device not found. The model will run on CPU.")
device = 'xpu' if xpu_available else 'cpu'
# --- Load Model and Tokenizer ---
model_name = "Qwen/Qwen1.5-7B-Chat"
print(f"\nLoading model: {model_name}...")
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_4bit=True,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# --- Move to XPU device ---
print(f"Moving model to device: {device}...")
model = model.to(device)
print("Model moved successfully.")
# --- Run Inference ---
# Build a chat prompt using the model's template
messages = [
{"role": "user", "content": "Explique le rôle du NPU dans l'IA PC."}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
print("\nGenerating response...")
# Move inputs to the same device as the model
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Generate response
outputs = model.generate(**inputs, max_new_tokens=256)
# Slice the output to only decode the new tokens
output_str = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\n--- Model Output ---")
print(output_str)
At this point, we are abandoning the Windows effort as it seems blocked. We will be attempting to replicate the setup on WSL next.
Thank you for your help so far.
Hi,
Following up on our extensive troubleshooting, we have now encountered a hard crash within the Level Zero driver API, which seems to be the root cause of the NPU issues on Windows for this machine.
Final Conclusion:
After attempting two entirely different installation methods ([xpu] with extra index URL vs. [npu]) and two different driver versions (the system default ...4082 and the recommended ...3104), we can confirm that the NPU cannot be successfully used.
Our final attempt involved running the official qwen.py example script with its default Qwen/Qwen2.5-7B-Instruct model. This also failed with the same low-level driver crash. This is conclusive evidence of a bug outside of our control.
Here is the final, definitive crash log from running the official example:
RuntimeError: Exception from src/inference/src/cpp/core.cpp:107:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_npu/src/plugin/src/plugin.cpp:697:
Exception from src/plugins/intel_npu/src/plugin/src/compiled_model.cpp:62:
Exception from src/plugins/intel_npu/src/compiler/src/zero_compiler_in_driver.cpp:853:
L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004
This is as far as we can go with debugging on Windows. We are now abandoning the Windows effort as it seems blocked by this bug.
Thank you for your assistance.
env-check:
conda activate llm-npu && .\env-check.bat ; start-sleep -milliseconds 10
Python 3.11.13
python: can't open file 'C:\\dev\\sources\\AI\\IPEX-LLM\\check.py': [Errno 2] No such file or directory
-----------------------------------------------------------------
System Information
Host Name: DESKTOP-UG8O6R2
OS Name: Microsoft Windows 11 Pro
OS Version: 10.0.26100 N/A Build 26100
OS Manufacturer: Microsoft Corporation
OS Configuration: Standalone Workstation
OS Build Type: Multiprocessor Free
Registered Owner: N/A
Registered Organization: N/A
Product ID: 00330-52914-15365-AAOEM
Original Install Date: 08/01/2025, 11:56:33
System Boot Time: 09/07/2025, 17:23:06
System Manufacturer: Micro Computer (HK) Tech Limited
System Model: AtomMan X Series
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
[01]: Intel64 Family 6 Model 170 Stepping 4 GenuineIntel ~2300 Mhz
BIOS Version: American Megatrends International, LLC. 1.00, 30/05/2024
Windows Directory: C:\WINDOWS
System Directory: C:\WINDOWS\system32
Boot Device: \Device\HarddiskVolume1
System Locale: en-us;English (United States)
Input Locale: en-gb;English (United Kingdom)
Time Zone: (UTC+00:00) Dublin, Edinburgh, Lisbon, London
Total Physical Memory: 97,810 MB
Available Physical Memory: 62,526 MB
Virtual Memory: Max Size: 103,954 MB
Virtual Memory: Available: 64,583 MB
Virtual Memory: In Use: 39,371 MB
Page File Location(s): C:\pagefile.sys
Domain: WORKGROUP
Logon Server: \\DESKTOP-UG8O6R2
Hotfix(s): 4 Hotfix(s) Installed.
[01]: KB5056579
[02]: KB5062553
[03]: KB5062862
[04]: KB5063666
Network Card(s): 3 NIC(s) Installed.
[01]: Realtek PCIe 5GbE Family Controller
Connection Name: Ethernet
Status: Media disconnected
[02]: Realtek PCIe 5GbE Family Controller
Connection Name: Ethernet 2
Status: Media disconnected
[03]: Bluetooth Device (Personal Area Network)
Connection Name: Bluetooth Network Connection
Status: Media disconnected
Virtualization-based security: Status: Running
Required Security Properties:
Base Virtualization Support
Available Security Properties:
Base Virtualization Support
Secure Boot
DMA Protection
UEFI Code Readonly
SMM Security Mitigations 1.0
Mode Based Execution Control
APIC Virtualization
Services Configured:
Hypervisor enforced Code Integrity
Services Running:
Hypervisor enforced Code Integrity
Hypervisor-Enforced Paging Translation
App Control for Business policy: Enforced
App Control for Business user mode policy: Off
Security Features Enabled:
Hyper-V Requirements: A hypervisor has been detected. Features required for Hyper-V will not be displayed.
-----------------------------------------------------------------
'xpu-smi' is not recognized as an internal or external command,
operable program or batch file.
xpu-smi is not installed properly.
Hi @doublefx , I think I need to make some clarification.
ipex-llm[xpu] and ipex-llm[npu] are different packages which target different hardwares.
If you want to run on GPU, you should use pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/, or if you want to run NPU, you should use pip install --pre --upgrade ipex-llm[npu].
They have some potential conflicts so you can not install both of them in a single conda env, otherwise, you should install in two different conda envs like llm-xpu / llm-npu.
Above exmaple script you used is to run on GPU, so it's normal that you did not see any NPU usage.
If you only want to run on NPU, please refer to this npu quickstart and this qwen npu example script.
Hi @rnwang04 ,
Following up on our extensive troubleshooting, we are still encountering a hard crash within the Level Zero driver API, which appears to be preventing NPU utilization on this machine.
Summary of Troubleshooting Steps and Findings:
We have performed a full environment reset and re-installation to ensure a clean slate, following the maintainer's latest advice for ipex-llm[npu] installation.
-
Environment Reset & Re-installation:
- Deactivated and removed the existing
llm-npuconda environment. - Created a new, clean
llm-npuenvironment withpython=3.11. - Explicitly installed
torch==2.1.2andtorchvision==0.16.2(as suggested by Multimodal example README for some models). - Installed
ipex-llm[npu]usingpip install --pre --upgrade ipex-llm[npu].
- Deactivated and removed the existing
-
Environment Check (
env-check.batoutput):Python 3.11.13 ----------------------------------------------------------------- transformers=4.40.0 ----------------------------------------------------------------- torch=2.1.2+cpu ----------------------------------------------------------------- Name: ipex-llm Version: 2.3.0b20250626 Summary: Large Language Model Develop Toolkit Home-page: https://github.com/intel-analytics/ipex-llm Author: BigDL Authors Author-email: [email protected] License: Apache License, Version 2.0 Location: C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\site-packages Requires: Required-by: ----------------------------------------------------------------- IPEX is not installed properly. ----------------------------------------------------------------- Traceback (most recent call last): File "C:\dev\sources\AI\IPEX-LLM\python\llm\scripts\check.py", line 172, in <module> main() File "C:\dev\sources\AI\IPEX-LLM\python\llm\scripts\check.py", line 164, in main check_memory() File "C:\dev\sources\AI\IPEX-LLM\python\llm\scripts\check.py", line 60, in check_memory physical_mem = subprocess.run('wmic computersystem get totalphysicalmemory', capture_output=True, text=True).stdout ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\subprocess.py", line 548, in run with Popen(*popenargs, **kwargs) as process: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\subprocess.py", line 1026, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\subprocess.py", line 1538, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [WinError 2] The system cannot find the file specifiedNote: As of Windows 10 21H1+ and Windows 11, Microsoft deprecated and removed
wmic.exefrom the system by default, which likely causes thisFileNotFoundError.----------------------------------------------------------------- System Information Xoading VSM Information ...Xtion ...X...X Host Name: DESKTOP-UG8O6R2 OS Name: Microsoft Windows 11 Pro OS Version: 10.0.26100 N/A Build 26100 OS Manufacturer: Microsoft Corporation OS Configuration: Standalone Workstation OS Build Type: Multiprocessor Free Registered Owner: N/A Registered Organization: N/A Product ID: 00330-52914-15365-AAOEM Original Install Date: 08/01/2025, 11:56:33 System Boot Time: 09/07/2025, 17:23:06 System Manufacturer: Micro Computer (HK) Tech Limited System Model: AtomMan X Series System Type: x64-based PC Processor(s): 1 Processor(s) Installed. [01]: Intel64 Family 6 Model 170 Stepping 4 GenuineIntel ~2300 Mhz BIOS Version: American Megatrends International, LLC. 1.00, 30/05/2024 Windows Directory: C:\WINDOWS System Directory: C:\WINDOWS\system32 Boot Device: \Device\HarddiskVolume1 System Locale: en-us;English (United States) Input Locale: en-gb;English (United Kingdom) Time Zone: (UTC+00:00) Dublin, Edinburgh, Lisbon, London Total Physical Memory: 97,810 MB Available Physical Memory: 58,578 MB Virtual Memory: Max Size: 103,954 MB Virtual Memory: Available: 60,543 MB Virtual Memory: In Use: 43,411 MB Page File Location(s): C:\pagefile.sys Domain: WORKGROUP Logon Server: \\DESKTOP-UG8O6R2 Hotfix(s): 4 Hotfix(s) Installed. [01]: KB5056579 [02]: KB5062553 [03]: KB5062862 [04]: KB5063666 Network Card(s): 3 NIC(s) Installed. [01]: Realtek PCIe 5GbE Family Controller Connection Name: Ethernet Status: Media disconnected [02]: Realtek PCIe 5GbE Family Controller Connection Name: Ethernet 2 Status: Media disconnected [03]: Bluetooth Device (Personal Area Network) Connection Name: Bluetooth Network Connection Status: Media disconnected Virtualization-based security: Status: Running Required Security Properties: Base Virtualization Support Available Security Properties: Base Virtualization Support Secure Boot DMA Protection UEFI Code Readonly SMM Security Mitigations 1.0 Mode Based Execution Control APIC Virtualization Services Configured: Hypervisor enforced Code Integrity Services Running: Hypervisor enforced Code Integrity Hypervisor-Enforced Paging Translation App Control for Business policy: Enforced App Control for Business user mode policy: Off Security Features Enabled: Hyper-V Requirements: A hypervisor has been detected. Features required for Hyper-V will not be displayed. ----------------------------------------------------------------- 'xpu-smi' is not recognized as an internal or external command, operable program or batch file. xpu-smi is not installed properly. -
Final Test Run (Official Example):
- Command Used:
set IPEX_LLM_NPU_MTL=1 && conda activate llm-npu && python python/llm/example/NPU/HF-Transformers-AutoModels/LLM/qwen.py --save-directory ./official_example_qwen2.5_cache --prompt "Explique le rôle du NPU dans l'IA PC." --n-predict 256 - Result: The script downloaded and converted the model (
Qwen/Qwen2.5-7B-Instruct) successfully, but then crashed with a low-level driver error during compilation for the NPU. - Crash Log:
RuntimeError: Exception from src/inference/src/cpp/core.cpp:107: Exception from src/inference/src/dev/plugin.cpp:53: Exception from src/plugins/intel_npu/src/plugin/src/plugin.cpp:697: Exception from src/plugins/intel_npu/src/plugin/src/compiled_model.cpp:62: Exception from src/plugins/intel_npu/src/compiler/src/zero_compiler_in_driver.cpp:853: L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004
- Command Used:
Conclusion: Despite meticulous adherence to installation instructions, environment resets, and testing with official examples and recommended models, the NPU remains inaccessible and causes a low-level driver crash during model compilation.
We performed again a full environment reset and re-installation to ensure a clean slate, strictly adhering to the latest advice for ipex-llm[npu] installation (without explicitly installing torch or torchvision).
We then ran the official qwen.py example script from the ipex-llm repository (python/llm/example/NPU/HF-Transformers-AutoModels/LLM/qwen.py) with its default, verified model (Qwen/Qwen2.5-7B-Instruct).
- Command Used:
set IPEX_LLM_NPU_MTL=1 && conda activate llm-npu && python python/llm/example/NPU/HF-Transformers-AutoModels/LLM/qwen.py --save-directory ./official_example_qwen2.5_cache --prompt "Explique le rôle du NPU dans l'IA PC." --n-predict 256 - Result: Exact same issue, and noticed the torch cpu was used instead, that's it.
Hi @doublefx , I want to confirm that, are you already changing the NPU driver to 32.0.100.3104 , and still got above errors ?
Yes, I downgraded my NPU driver to 32.0.100.3104
Hi @doublefx , just share some personal experience here. First about the installation part, I just
conda create -n llm-npu python=3.11
conda activate llm-npu
pip install --pre --upgrade ipex-llm[npu]
Then I change the NPU driver to 32.0.100.3104:
First time I run below cmd, I meet similar error as yours :
(llm-npu) D:\ruonan\ipex-llm\python\llm\example\NPU\HF-Transformers-AutoModels\LLM>python qwen.py --repo-id-or-model-path D:\llm-models\Qwen2-7B-Instruct --save-directory .\npu-qwen2
decode start compiling
decode end compiling
Model saved to .\npu-qwen2\decoder_layer_0.xml
decode start compiling
decode end compiling
Model saved to .\npu-qwen2\decoder_layer_1.xml
prefill start compiling
prefill end compiling
Model saved to .\npu-qwen2\decoder_layer_prefill.xml
Traceback (most recent call last):
File "D:\ruonan\ipex-llm\python\llm\example\NPU\HF-Transformers-AutoModels\LLM\qwen.py", line 60, in <module>
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\unittest\mock.py", line 1378, in patched
return func(*newargs, **newkeywargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_model.py", line 246, in from_pretrained
model = cls.optimize_npu_model(*args, **optimize_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_model.py", line 325, in optimize_npu_model
optimize_llm_single_process(
File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_models\convert.py", line 460, in optimize_llm_single_process
convert_llm(model,
File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\convert_pipeline.py", line 218, in convert_llm
convert_llm_for_deploy(model,
File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\convert_pipeline.py", line 498, in convert_llm_for_deploy
convert_qwen_layer(model, 0, n_splits_linear, n_splits_down_proj,
File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\qwen.py", line 186, in convert_qwen_layer
rest_blob_path = update_names_of_IR_and_export_blob(single_decoder,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\common.py", line 60, in update_names_of_IR_and_export_blob
compiledModel = core.compile_model(model, device_name="NPU")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\intel_npu_acceleration_library\backend\..\external\openvino\runtime\ie_api.py", line 543, in compile_model
super().compile_model(model, device_name, {} if config is None else config),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:107:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_npu/src/plugin/src/plugin.cpp:697:
Exception from src/plugins/intel_npu/src/plugin/src/compiled_model.cpp:62:
Exception from src/plugins/intel_npu/src/compiler/src/zero_compiler_in_driver.cpp:853:
L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004
Then I deleted this npu-qwen2 directory and cache directory, and add set IPEX_LLM_NPU_MTL=1, now this time I run this example successfully:
(llm-npu) D:\ruonan\ipex-llm\python\llm\example\NPU\HF-Transformers-AutoModels\LLM>set IPEX_LLM_NPU_MTL=1
(llm-npu) D:\ruonan\ipex-llm\python\llm\example\NPU\HF-Transformers-AutoModels\LLM>python qwen.py --repo-id-or-model-path D:\llm-models\Qwen2-7B-Instruct --save-directory .\npu-qwen2
decode start compiling
decode end compiling
Model saved to .\npu-qwen2\decoder_layer_0.xml
decode start compiling
decode end compiling
Model saved to .\npu-qwen2\decoder_layer_1.xml
prefill start compiling
prefill end compiling
Model saved to .\npu-qwen2\decoder_layer_prefill.xml
start compiling
Model saved to .\npu-qwen2\lm_head.xml
start compiling
C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_model.py:49: UserWarning: Model is already saved at .\npu-qwen2
warnings.warn(f"Model is already saved at {self.save_directory}")
2025-07-10 20:55:13,266 - INFO - Converted model has already saved to .\npu-qwen2.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
--------------------------------------------------------------------------------
done
finish to load
-------------------- Input --------------------
input length: 22
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
AI是什么?<|im_end|>
<|im_start|>assistant
-------------------- Output --------------------
system
You are a helpful assistant.
user
AI是什么?
assistant
AI,即人工智能(Artificial Intelligence),是一种计算机科学领域,旨在创建智能机器和软件程序,这些可以执行通常需要人类智慧的任务,如
You may have a last try after removing your local directory. If you still meet such error, sadly I am not sure what caused the problem.
Hi @rnwang04, Thanks a lot for sharing your experience 🙏
I actually followed the exact same steps, including:
- Creating the
llm-npuenvironment with Python 3.11 - Installing
ipex-llm[npu]with--pre - Same NPU driver to
32.0.100.3104 - Setting
IPEX_LLM_NPU_MTL=1
Removing both the npu-qwen2 output directory and the cache before re-running
Still, I’m hitting the same ZE_RESULT_ERROR_INVALID_ARGUMENT error at the core.compile_model(model, device_name="NPU") step. 😢
Is there anything else I might be missing?
Really appreciate your help 🙏
Just two reminders, maybe you can check if you have enough disk space C:\ and check if IPEX_LLM_NPU_MTL=1 really works, for example, if you are running with Miniforge Prompt .
Other than that, I can't think of any other possible reasons. 😢