ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

OSError: [WinError 126]` when importing `torch` with IPEX-LLM on Windows with Intel Arc GPU

Open doublefx opened this issue 6 months ago • 14 comments

System Information:

  • OS: Windows 11
  • GPU: Intel(R) Arc(TM) Graphics
  • GPU Driver Version: 32.0.101.6913

Environment:

  • Python: 3.11 (via Conda)
  • IPEX-LLM Installation Method: pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
  • PyTorch Version Installed: torch-2.1.0a0+cxx11.abi
  • Visual Studio: VS 2022 with "Desktop development with C++" workload installed (including C++/CLI support components).

Description

Following the official "Install IPEX-LLM on Windows with Intel GPU" guide, the environment setup appears successful, installing the torch-2.1.0a0+cxx11.abi package as expected.

However, any attempt to import torch after this installation fails with the following error:

OSError: [WinError 126] The specified module could not be found. Error loading "C:\...\Lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.

This prevents any use of ipex-llm on the GPU.

Steps to Reproduce

  1. On a supported Windows 11 machine with an Intel Arc GPU and up-to-date drivers, ensure Visual Studio 2022 with the C++ workload is installed.
  2. Create a clean Conda environment with Python 3.11.
  3. In the Conda environment, run pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/.
  4. Attempt to run any Python script that begins with import torch.
  5. The OSError: [WinError 126] occurs.

Troubleshooting Performed

  • Confirmed the issue is identical to the one described in oobabooga/text-generation-webui#6253.
  • Programmatically added the MSVC build tools directory (C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\[version]\bin\Hostx64\x64) to the system PATH before the import, with no change in result.
  • Ensured all relevant Visual Studio C++ components, including C++/CLI support, are installed.
  • The error seems to indicate a missing system-level DLL dependency or a problem with how the backend_with_compiler.dll was compiled.

This appears to be a blocking issue for using IPEX-LLM on certain Windows + Intel Arc configurations.

doublefx avatar Jul 08 '25 09:07 doublefx

Hi @doublefx , could you please provide some env check info with this scripts? Besides, is it work fine just install torch==2.1.0a0?

lalalapotter avatar Jul 09 '25 02:07 lalalapotter

Hi, thank you for the suggestion. I've performed the tests you requested.

1. Standalone Torch Test: You asked if installing torch==2.1.0a0 works on its own. The package is present in my environment, but attempting to import it fails with the same error as before:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\site-packages\torch\__init__.py", line 139, in <module>
    raise err
OSError: [WinError 126] The specified module could not be found. Error loading "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.

2. Environment Check: I ran the python\llm\scripts\env-check.bat script. It failed to find xpu-smi. Here is the system information it produced:

OS Name:                       Microsoft Windows 11 Pro
OS Version:                    10.0.26100 N/A Build 26100
System Manufacturer:           Micro Computer (HK) Tech Limited
System Model:                  AtomMan X Series
System Type:                   x64-based PC
Processor(s):                  [01]: Intel64 Family 6 Model 170 Stepping 4 GenuineIntel ~2300 Mhz
BIOS Version:                  American Megatrends International, LLC. 1.00, 30/05/2024

3. Driver Version Discovery: I found that my currently installed NPU driver version is 32.0.100.4082.

Your npu_quickstart.md documentation highly recommends driver version 32.0.100.3104.

Question: Given that the torch import fails on its own and my NPU driver is newer than the recommended version, is it likely that this driver mismatch is the root cause of the WinError 126? Would the correct next step be for me to uninstall my current driver and install the recommended 32.0.100.3104 version?

Thank you for your help.

doublefx avatar Jul 09 '25 11:07 doublefx

Hi @doublefx , could you please try conda install libuv to see if it helps ? Just in case you miss this step during create your conda env. Image

rnwang04 avatar Jul 09 '25 13:07 rnwang04

Hi, thank you again for your help. The suggestion to install libuv was correct and it resolved the WinError 126 when importing torch.

However, we have encountered a new problem. After successfully installing libuv, I performed a clean reinstallation of ipex-llm:

  1. pip uninstall torch ipex-llm -y
  2. pip install --pre --upgrade --no-cache-dir ipex-llm[npu]

The installation completed, but it installed the CPU-only version of PyTorch (torch-2.1.2+cpu).

When I run the env-check.bat script, it now shows IPEX is not installed properly and xpu-smi is not recognized, which makes sense if we only have the CPU version of the libraries.

This seems to confirm that the pip installer is not able to find the correct NPU-enabled version of torch for my system.

To summarize:

  • The base torch import error is fixed.
  • pip install ipex-llm[npu] now installs successfully, but only fetches the CPU packages.
  • My NPU driver is 32.0.100.4082.
  • The documentation recommends 32.0.100.3104.

Is there a specific --index-url I should be using? Or does this confirm that the driver version mismatch is preventing pip from finding the correct NPU packages?

doublefx avatar Jul 09 '25 13:07 doublefx

Hi,

Following up on our previous conversation, we've made significant progress based on your advice but have unfortunately hit a final wall on the Windows platform.

High-Level Summary: The good news is that we have resolved all Python errors. The test script now runs successfully and generates correct text output. The bad news is that the NPU utilization remains at 0%, even after we aligned the entire environment (packages and drivers) to the official recommendations.

Key Steps and Findings:

  1. Installation Success: Your advice was crucial. We discovered the --extra-index-url and [xpu] flag in the documentation, which allowed us to install the correct, XPU-enabled versions of torch (2.1.0a0+cxx11.abi) and ipex (2.1.10+xpu). We also have libuv installed in the conda environment. torch.xpu.is_available() now correctly returns True.

  2. Driver Downgrade: After confirming the software was installed correctly, we observed that the NPU utilization was still 0% with the system's default driver (32.0.100.4082). Following the documentation's strong recommendation, we have now successfully downgraded the NPU driver to the specified 32.0.100.3104 and rebooted the system.

  3. Final Result: Even with the recommended driver (32.0.100.3104) and the correct libraries, the result is the same. The script runs and produces the expected text, but the NPU utilization graph in Task Manager remains at 0%.

Conclusion: We believe we have uncovered a deeper bug. We have meticulously followed every step, and the environment should now be perfectly configured, yet the NPU hardware is not being engaged for computation. The process appears to be falling back to the GPU.

For your reference, here is the exact script we are using for testing:

import os
os.environ['IPEX_LLM_NPU_MTL'] = '1'

import torch
from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

# --- Verify Correct Installation ---
print(f"PyTorch version: {torch.__version__}")
xpu_available = hasattr(torch, 'xpu') and torch.xpu.is_available()
print(f"Is XPU available? {xpu_available}")
if not xpu_available:
    print("Warning: XPU device not found. The model will run on CPU.")
    
device = 'xpu' if xpu_available else 'cpu'

# --- Load Model and Tokenizer ---
model_name = "Qwen/Qwen1.5-7B-Chat"
print(f"\nLoading model: {model_name}...")

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# --- Move to XPU device ---
print(f"Moving model to device: {device}...")
model = model.to(device)
print("Model moved successfully.")

# --- Run Inference ---
# Build a chat prompt using the model's template
messages = [
    {"role": "user", "content": "Explique le rôle du NPU dans l'IA PC."}
]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

print("\nGenerating response...")
# Move inputs to the same device as the model
inputs = tokenizer(prompt, return_tensors="pt").to(device)

# Generate response
outputs = model.generate(**inputs, max_new_tokens=256)
# Slice the output to only decode the new tokens
output_str = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("\n--- Model Output ---")
print(output_str)

At this point, we are abandoning the Windows effort as it seems blocked. We will be attempting to replicate the setup on WSL next.

Thank you for your help so far.

doublefx avatar Jul 09 '25 17:07 doublefx

Hi,

Following up on our extensive troubleshooting, we have now encountered a hard crash within the Level Zero driver API, which seems to be the root cause of the NPU issues on Windows for this machine.

Final Conclusion: After attempting two entirely different installation methods ([xpu] with extra index URL vs. [npu]) and two different driver versions (the system default ...4082 and the recommended ...3104), we can confirm that the NPU cannot be successfully used.

Our final attempt involved running the official qwen.py example script with its default Qwen/Qwen2.5-7B-Instruct model. This also failed with the same low-level driver crash. This is conclusive evidence of a bug outside of our control.

Here is the final, definitive crash log from running the official example:

RuntimeError: Exception from src/inference/src/cpp/core.cpp:107:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_npu/src/plugin/src/plugin.cpp:697:
Exception from src/plugins/intel_npu/src/plugin/src/compiled_model.cpp:62:
Exception from src/plugins/intel_npu/src/compiler/src/zero_compiler_in_driver.cpp:853:
L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004

This is as far as we can go with debugging on Windows. We are now abandoning the Windows effort as it seems blocked by this bug.

Thank you for your assistance.

env-check:

conda activate llm-npu && .\env-check.bat ; start-sleep -milliseconds 10
Python 3.11.13
python: can't open file 'C:\\dev\\sources\\AI\\IPEX-LLM\\check.py': [Errno 2] No such file or directory
-----------------------------------------------------------------
System Information

Host Name:                     DESKTOP-UG8O6R2
OS Name:                       Microsoft Windows 11 Pro
OS Version:                    10.0.26100 N/A Build 26100
OS Manufacturer:               Microsoft Corporation
OS Configuration:              Standalone Workstation
OS Build Type:                 Multiprocessor Free
Registered Owner:              N/A
Registered Organization:       N/A
Product ID:                    00330-52914-15365-AAOEM
Original Install Date:         08/01/2025, 11:56:33
System Boot Time:              09/07/2025, 17:23:06
System Manufacturer:           Micro Computer (HK) Tech Limited
System Model:                  AtomMan X Series
System Type:                   x64-based PC
Processor(s):                  1 Processor(s) Installed.
                               [01]: Intel64 Family 6 Model 170 Stepping 4 GenuineIntel ~2300 Mhz
BIOS Version:                  American Megatrends International, LLC. 1.00, 30/05/2024
Windows Directory:             C:\WINDOWS
System Directory:              C:\WINDOWS\system32
Boot Device:                   \Device\HarddiskVolume1
System Locale:                 en-us;English (United States)
Input Locale:                  en-gb;English (United Kingdom)
Time Zone:                     (UTC+00:00) Dublin, Edinburgh, Lisbon, London
Total Physical Memory:         97,810 MB
Available Physical Memory:     62,526 MB
Virtual Memory: Max Size:      103,954 MB
Virtual Memory: Available:     64,583 MB
Virtual Memory: In Use:        39,371 MB
Page File Location(s):         C:\pagefile.sys
Domain:                        WORKGROUP
Logon Server:                  \\DESKTOP-UG8O6R2
Hotfix(s):                     4 Hotfix(s) Installed.
                               [01]: KB5056579
                               [02]: KB5062553
                               [03]: KB5062862
                               [04]: KB5063666
Network Card(s):               3 NIC(s) Installed.
                               [01]: Realtek PCIe 5GbE Family Controller
                                     Connection Name: Ethernet
                                     Status:          Media disconnected
                               [02]: Realtek PCIe 5GbE Family Controller
                                     Connection Name: Ethernet 2
                                     Status:          Media disconnected
                               [03]: Bluetooth Device (Personal Area Network)
                                     Connection Name: Bluetooth Network Connection
                                     Status:          Media disconnected
Virtualization-based security: Status: Running
                               Required Security Properties:
                                     Base Virtualization Support
                               Available Security Properties:
                                     Base Virtualization Support
                                     Secure Boot
                                     DMA Protection
                                     UEFI Code Readonly
                                     SMM Security Mitigations 1.0
                                     Mode Based Execution Control
                                     APIC Virtualization
                               Services Configured:
                                     Hypervisor enforced Code Integrity
                               Services Running:
                                     Hypervisor enforced Code Integrity
                                     Hypervisor-Enforced Paging Translation
                               App Control for Business policy: Enforced
                               App Control for Business user mode policy: Off
                               Security Features Enabled:
Hyper-V Requirements:          A hypervisor has been detected. Features required for Hyper-V will not be displayed.
-----------------------------------------------------------------
'xpu-smi' is not recognized as an internal or external command,
operable program or batch file.
xpu-smi is not installed properly.

doublefx avatar Jul 09 '25 18:07 doublefx

Hi @doublefx , I think I need to make some clarification. ipex-llm[xpu] and ipex-llm[npu] are different packages which target different hardwares. If you want to run on GPU, you should use pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/, or if you want to run NPU, you should use pip install --pre --upgrade ipex-llm[npu]. They have some potential conflicts so you can not install both of them in a single conda env, otherwise, you should install in two different conda envs like llm-xpu / llm-npu. Above exmaple script you used is to run on GPU, so it's normal that you did not see any NPU usage. If you only want to run on NPU, please refer to this npu quickstart and this qwen npu example script.

rnwang04 avatar Jul 10 '25 02:07 rnwang04

Hi @rnwang04 ,

Following up on our extensive troubleshooting, we are still encountering a hard crash within the Level Zero driver API, which appears to be preventing NPU utilization on this machine.

Summary of Troubleshooting Steps and Findings:

We have performed a full environment reset and re-installation to ensure a clean slate, following the maintainer's latest advice for ipex-llm[npu] installation.

  1. Environment Reset & Re-installation:

    • Deactivated and removed the existing llm-npu conda environment.
    • Created a new, clean llm-npu environment with python=3.11.
    • Explicitly installed torch==2.1.2 and torchvision==0.16.2 (as suggested by Multimodal example README for some models).
    • Installed ipex-llm[npu] using pip install --pre --upgrade ipex-llm[npu].
  2. Environment Check (env-check.bat output):

    Python 3.11.13
    -----------------------------------------------------------------
    transformers=4.40.0
    -----------------------------------------------------------------
    torch=2.1.2+cpu
    -----------------------------------------------------------------
    Name: ipex-llm
    Version: 2.3.0b20250626
    Summary: Large Language Model Develop Toolkit
    Home-page: https://github.com/intel-analytics/ipex-llm
    Author: BigDL Authors
    Author-email: [email protected]
    License: Apache License, Version 2.0
    Location: C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\site-packages
    Requires:
    Required-by:
    -----------------------------------------------------------------
    IPEX is not installed properly. 
    -----------------------------------------------------------------
    Traceback (most recent call last):
      File "C:\dev\sources\AI\IPEX-LLM\python\llm\scripts\check.py", line 172, in <module>
        main()
      File "C:\dev\sources\AI\IPEX-LLM\python\llm\scripts\check.py", line 164, in main
        check_memory()
      File "C:\dev\sources\AI\IPEX-LLM\python\llm\scripts\check.py", line 60, in check_memory
        physical_mem = subprocess.run('wmic computersystem get totalphysicalmemory', capture_output=True, text=True).stdout
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\subprocess.py", line 548, in run
        with Popen(*popenargs, **kwargs) as process:
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\subprocess.py", line 1026, in __init__
        self._execute_child(args, executable, preexec_fn, close_fds,
      File "C:\Users\DoubleFx\anaconda3\envs\llm-npu\Lib\subprocess.py", line 1538, in _execute_child
        hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    FileNotFoundError: [WinError 2] The system cannot find the file specified
    

    Note: As of Windows 10 21H1+ and Windows 11, Microsoft deprecated and removed wmic.exe from the system by default, which likely causes this FileNotFoundError.

    -----------------------------------------------------------------
    System Information
    Xoading VSM Information ...Xtion ...X...X
    Host Name:                     DESKTOP-UG8O6R2
    OS Name:                       Microsoft Windows 11 Pro
    OS Version:                    10.0.26100 N/A Build 26100
    OS Manufacturer:               Microsoft Corporation
    OS Configuration:              Standalone Workstation
    OS Build Type:                 Multiprocessor Free
    Registered Owner:              N/A
    Registered Organization:       N/A
    Product ID:                    00330-52914-15365-AAOEM
    Original Install Date:         08/01/2025, 11:56:33
    System Boot Time:              09/07/2025, 17:23:06
    System Manufacturer:           Micro Computer (HK) Tech Limited
    System Model:                  AtomMan X Series
    System Type:                   x64-based PC
    Processor(s):                  1 Processor(s) Installed.
                                   [01]: Intel64 Family 6 Model 170 Stepping 4 GenuineIntel ~2300 Mhz
    BIOS Version:                  American Megatrends International, LLC. 1.00, 30/05/2024
    Windows Directory:             C:\WINDOWS
    System Directory:              C:\WINDOWS\system32
    Boot Device:                   \Device\HarddiskVolume1
    System Locale:                 en-us;English (United States)
    Input Locale:                  en-gb;English (United Kingdom)
    Time Zone:                     (UTC+00:00) Dublin, Edinburgh, Lisbon, London
    Total Physical Memory:         97,810 MB
    Available Physical Memory:     58,578 MB
    Virtual Memory: Max Size:      103,954 MB
    Virtual Memory: Available:     60,543 MB
    Virtual Memory: In Use:        43,411 MB
    Page File Location(s):         C:\pagefile.sys
    Domain:                        WORKGROUP
    Logon Server:                  \\DESKTOP-UG8O6R2
    Hotfix(s):                     4 Hotfix(s) Installed.
                                   [01]: KB5056579
                                   [02]: KB5062553
                                   [03]: KB5062862
                                   [04]: KB5063666
    Network Card(s):               3 NIC(s) Installed.
                                   [01]: Realtek PCIe 5GbE Family Controller
                                         Connection Name: Ethernet
                                         Status:          Media disconnected
                                   [02]: Realtek PCIe 5GbE Family Controller
                                         Connection Name: Ethernet 2
                                         Status:          Media disconnected
                                   [03]: Bluetooth Device (Personal Area Network)
                                         Connection Name: Bluetooth Network Connection
                                         Status:          Media disconnected
    Virtualization-based security: Status: Running
                                   Required Security Properties:
                                         Base Virtualization Support
                                   Available Security Properties:
                                         Base Virtualization Support
                                         Secure Boot
                                         DMA Protection
                                         UEFI Code Readonly
                                         SMM Security Mitigations 1.0
                                         Mode Based Execution Control
                                         APIC Virtualization
                                   Services Configured:
                                         Hypervisor enforced Code Integrity
                                   Services Running:
                                         Hypervisor enforced Code Integrity
                                         Hypervisor-Enforced Paging Translation
                                   App Control for Business policy: Enforced
                                   App Control for Business user mode policy: Off
                                   Security Features Enabled:
    Hyper-V Requirements:          A hypervisor has been detected. Features required for Hyper-V will not be displayed.
    -----------------------------------------------------------------
    'xpu-smi' is not recognized as an internal or external command,
    operable program or batch file.
    xpu-smi is not installed properly.
    
  3. Final Test Run (Official Example):

    • Command Used:
      set IPEX_LLM_NPU_MTL=1 && conda activate llm-npu && python python/llm/example/NPU/HF-Transformers-AutoModels/LLM/qwen.py --save-directory ./official_example_qwen2.5_cache --prompt "Explique le rôle du NPU dans l'IA PC." --n-predict 256
      
    • Result: The script downloaded and converted the model (Qwen/Qwen2.5-7B-Instruct) successfully, but then crashed with a low-level driver error during compilation for the NPU.
    • Crash Log:
      RuntimeError: Exception from src/inference/src/cpp/core.cpp:107:
      Exception from src/inference/src/dev/plugin.cpp:53:
      Exception from src/plugins/intel_npu/src/plugin/src/plugin.cpp:697:
      Exception from src/plugins/intel_npu/src/plugin/src/compiled_model.cpp:62:
      Exception from src/plugins/intel_npu/src/compiler/src/zero_compiler_in_driver.cpp:853:
      L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004
      

Conclusion: Despite meticulous adherence to installation instructions, environment resets, and testing with official examples and recommended models, the NPU remains inaccessible and causes a low-level driver crash during model compilation.

doublefx avatar Jul 10 '25 07:07 doublefx

We performed again a full environment reset and re-installation to ensure a clean slate, strictly adhering to the latest advice for ipex-llm[npu] installation (without explicitly installing torch or torchvision).

We then ran the official qwen.py example script from the ipex-llm repository (python/llm/example/NPU/HF-Transformers-AutoModels/LLM/qwen.py) with its default, verified model (Qwen/Qwen2.5-7B-Instruct).

  • Command Used:
    set IPEX_LLM_NPU_MTL=1 && conda activate llm-npu && python python/llm/example/NPU/HF-Transformers-AutoModels/LLM/qwen.py --save-directory ./official_example_qwen2.5_cache --prompt "Explique le rôle du NPU dans l'IA PC." --n-predict 256
    
  • Result: Exact same issue, and noticed the torch cpu was used instead, that's it.

doublefx avatar Jul 10 '25 08:07 doublefx

Hi @doublefx , I want to confirm that, are you already changing the NPU driver to 32.0.100.3104 , and still got above errors ?

rnwang04 avatar Jul 10 '25 09:07 rnwang04

Yes, I downgraded my NPU driver to 32.0.100.3104

doublefx avatar Jul 10 '25 10:07 doublefx

Hi @doublefx , just share some personal experience here. First about the installation part, I just

conda create -n llm-npu python=3.11
conda activate llm-npu
pip install --pre --upgrade ipex-llm[npu]

Then I change the NPU driver to 32.0.100.3104:

Image

First time I run below cmd, I meet similar error as yours : (llm-npu) D:\ruonan\ipex-llm\python\llm\example\NPU\HF-Transformers-AutoModels\LLM>python qwen.py --repo-id-or-model-path D:\llm-models\Qwen2-7B-Instruct --save-directory .\npu-qwen2

decode start compiling
decode end compiling
Model saved to .\npu-qwen2\decoder_layer_0.xml
decode start compiling
decode end compiling
Model saved to .\npu-qwen2\decoder_layer_1.xml
prefill start compiling
prefill end compiling
Model saved to .\npu-qwen2\decoder_layer_prefill.xml
Traceback (most recent call last):
  File "D:\ruonan\ipex-llm\python\llm\example\NPU\HF-Transformers-AutoModels\LLM\qwen.py", line 60, in <module>
    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\unittest\mock.py", line 1378, in patched
    return func(*newargs, **newkeywargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_model.py", line 246, in from_pretrained
    model = cls.optimize_npu_model(*args, **optimize_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_model.py", line 325, in optimize_npu_model
    optimize_llm_single_process(
  File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_models\convert.py", line 460, in optimize_llm_single_process
    convert_llm(model,
  File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\convert_pipeline.py", line 218, in convert_llm
    convert_llm_for_deploy(model,
  File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\convert_pipeline.py", line 498, in convert_llm_for_deploy
    convert_qwen_layer(model, 0, n_splits_linear, n_splits_down_proj,
  File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\qwen.py", line 186, in convert_qwen_layer
    rest_blob_path = update_names_of_IR_and_export_blob(single_decoder,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\common.py", line 60, in update_names_of_IR_and_export_blob
    compiledModel = core.compile_model(model, device_name="NPU")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\intel_npu_acceleration_library\backend\..\external\openvino\runtime\ie_api.py", line 543, in compile_model
    super().compile_model(model, device_name, {} if config is None else config),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:107:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_npu/src/plugin/src/plugin.cpp:697:
Exception from src/plugins/intel_npu/src/plugin/src/compiled_model.cpp:62:
Exception from src/plugins/intel_npu/src/compiler/src/zero_compiler_in_driver.cpp:853:
L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004

Then I deleted this npu-qwen2 directory and cache directory, and add set IPEX_LLM_NPU_MTL=1, now this time I run this example successfully:

(llm-npu) D:\ruonan\ipex-llm\python\llm\example\NPU\HF-Transformers-AutoModels\LLM>set IPEX_LLM_NPU_MTL=1

(llm-npu) D:\ruonan\ipex-llm\python\llm\example\NPU\HF-Transformers-AutoModels\LLM>python qwen.py --repo-id-or-model-path D:\llm-models\Qwen2-7B-Instruct --save-directory .\npu-qwen2

decode start compiling
decode end compiling
Model saved to .\npu-qwen2\decoder_layer_0.xml
decode start compiling
decode end compiling
Model saved to .\npu-qwen2\decoder_layer_1.xml
prefill start compiling
prefill end compiling
Model saved to .\npu-qwen2\decoder_layer_prefill.xml
start compiling
Model saved to .\npu-qwen2\lm_head.xml
start compiling
C:\Users\arda\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_model.py:49: UserWarning: Model is already saved at .\npu-qwen2
  warnings.warn(f"Model is already saved at {self.save_directory}")
2025-07-10 20:55:13,266 - INFO - Converted model has already saved to .\npu-qwen2.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
--------------------------------------------------------------------------------
done
finish to load
-------------------- Input --------------------
input length: 22
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
AI是什么?<|im_end|>
<|im_start|>assistant

-------------------- Output --------------------
system
You are a helpful assistant.
user
AI是什么?
assistant
AI,即人工智能(Artificial Intelligence),是一种计算机科学领域,旨在创建智能机器和软件程序,这些可以执行通常需要人类智慧的任务,如

You may have a last try after removing your local directory. If you still meet such error, sadly I am not sure what caused the problem.

rnwang04 avatar Jul 10 '25 13:07 rnwang04

Hi @rnwang04, Thanks a lot for sharing your experience 🙏

I actually followed the exact same steps, including:

  • Creating the llm-npu environment with Python 3.11
  • Installing ipex-llm[npu] with --pre
  • Same NPU driver to 32.0.100.3104
  • Setting IPEX_LLM_NPU_MTL=1

Removing both the npu-qwen2 output directory and the cache before re-running

Still, I’m hitting the same ZE_RESULT_ERROR_INVALID_ARGUMENT error at the core.compile_model(model, device_name="NPU") step. 😢

Is there anything else I might be missing?

Really appreciate your help 🙏

doublefx avatar Jul 10 '25 13:07 doublefx

Just two reminders, maybe you can check if you have enough disk space C:\ and check if IPEX_LLM_NPU_MTL=1 really works, for example, if you are running with Miniforge Prompt . Other than that, I can't think of any other possible reasons. 😢

rnwang04 avatar Jul 14 '25 02:07 rnwang04