openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[Bug]: Run OpenVINO benchmark_app with Yolo-v4-tf model Failed on NPU

Open joey5678 opened this issue 1 year ago • 2 comments

OpenVINO Version

2024.1.0-15008-f4afc983258-releases/2024/1

Operating System

Other (Please specify in description)

Device used for inference

NPU

Framework

TensorFlow 1

Model used

yolo-v4-tf

Issue description

When running command:

./benchmark_app -d NPU -m /home/aibox/share/models/public/yolo-v4-tf/FP16/yolo-v4-tf.xml -t 20

Got error: [ ERROR ] Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21: L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_DEVICE_LOST, code 0x70000001 - device hung, reset, was removed, or driver update occurred

Same error if with yolo-v4-tf INT8 model

Step-by-step reproduction

Setup Ubuntu22.04.3 OS system on MTL platform (Ultra 7 165HL)

Install GPU driver and NPU v1.2 driver Install OpenVINO 2024.1 Run install_dependency script Build CPP benchmark_app

apt list --installed | grep -E 'intel|zero|opencl'

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

intel-driver-compiler-npu/now 1.2.0.20240404-8553879914 amd64 [installed,local]
intel-fw-npu/now 1.2.0.20240404-8553879914 amd64 [installed,local]
intel-gpu-tools/jammy,now 1.26-2 amd64 [installed]
intel-igc-cm/unknown,now 1.0.224-821~22.04 amd64 [installed]
intel-igc-core/now 1.0.16510.2 amd64 [installed,local]
intel-igc-opencl/now 1.0.16510.2 amd64 [installed,local]
intel-level-zero-gpu/now 1.3.29138.7 amd64 [installed,local]
intel-level-zero-npu/now 1.2.0.20240404-8553879914 amd64 [installed,local]
intel-media-va-driver-non-free/unknown,now 23.4.3-804~22.04 amd64 [installed]
intel-opencl-icd/now 24.13.29138.7 amd64 [installed,local]
level-zero-dev/unknown,now 1.16.15-821~22.04 amd64 [installed]
level-zero/unknown,now 1.16.15-821~22.04 amd64 [installed]
libdrm-intel1/unknown,now 2.4.119-2101~22.04 amd64 [installed,automatic]
ocl-icd-libopencl1/jammy,now 2.2.14-3 amd64 [installed]

Relevant log output

./benchmark_app -d NPU -m /home/aibox/share/models/public/yolo-v4-tf/FP16/yolo-v4-tf.xml -t 20
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ] 
[ INFO ] Device info:
[ INFO ] NPU
[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(NPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 7.74 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ]     image_input (node: image_input) : f32 / [N,H,W,C] / [1,608,608,3]
[ INFO ] Network outputs:
[ INFO ]     conv2d_101 (node: model/conv2d_101/BiasAdd) : f32 / [...] / [1,38,38,255]
[ INFO ]     conv2d_109 (node: model/conv2d_109/BiasAdd) : f32 / [...] / [1,19,19,255]
[ INFO ]     conv2d_93 (node: model/conv2d_93/BiasAdd) : f32 / [...] / [1,76,76,255]
[Step 5/11] Resizing model to match image sizes and given batch
[Step 6/11] Configuring input of the model
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ]     image_input (node: image_input) : u8 / [N,H,W,C] / [1,608,608,3]
[ INFO ] Network outputs:
[ INFO ]     conv2d_101 (node: model/conv2d_101/BiasAdd) : f32 / [...] / [1,38,38,255]
[ INFO ]     conv2d_109 (node: model/conv2d_109/BiasAdd) : f32 / [...] / [1,19,19,255]
[ INFO ]     conv2d_93 (node: model/conv2d_93/BiasAdd) : f32 / [...] / [1,76,76,255]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 4886.74 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   DEVICE_ID: 
[ INFO ]   ENABLE_CPU_PINNING: NO
[ INFO ]   EXECUTION_DEVICES: NPU.3720
[ INFO ]   INFERENCE_PRECISION_HINT: f16
[ INFO ]   INTERNAL_SUPPORTED_PROPERTIES: CACHING_PROPERTIES
[ INFO ]   LOADED_FROM_CACHE: NO
[ INFO ]   NETWORK_NAME: 
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ]   PERFORMANCE_HINT: THROUGHPUT
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 1
[ INFO ]   PERF_COUNT: NO
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] image_input  ([N,H,W,C], u8, [1,608,608,3], static):   random (image/numpy array is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 20000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ ERROR ] Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21:
L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_DEVICE_LOST, code 0x70000001 - device hung, reset, was removed, or driver update occurred

Issue submission checklist

  • [X] I'm reporting an issue. It's not a question.
  • [X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • [X] There is reproducer code and related data files such as images, videos, models, etc.

joey5678 avatar May 17 '24 03:05 joey5678

@mlyashko, @rzubarev, please help to resolve it.

rkazants avatar May 17 '24 06:05 rkazants

Any update ?

BTW, I got another error when running benchmark_app with INT8 yolov8 model under the same test environment: error: MultiClusterStrategyAssignment Pass failed : Cannot get per cluster memory shapes. Unsupported distribution: #VPU.DistributedTensor<mode = <SEGMENTED>, num_tiles = [1, 1, 2, 1], num_clusters = 2 : i64, alignment = [1, 1, 4, 1]>

completed output log:

./benchmark_app -d NPU -m ~/share/models/public/yolo-v8n/INT8/yolo-v8n.xml -t 15
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ] 
[ INFO ] Device info:
[ INFO ] NPU
[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(NPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 11.67 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ]     images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Network outputs:
[ INFO ]     output0 (node: output0) : f32 / [...] / [1,84,8400]
[ INFO ]     onnx::Reshape_421 (node: onnx::Reshape_421) : f32 / [...] / [1,144,80,80]
[ INFO ]     onnx::Reshape_436 (node: onnx::Reshape_436) : f32 / [...] / [1,144,40,40]
[ INFO ]     onnx::Reshape_451 (node: onnx::Reshape_451) : f32 / [...] / [1,144,20,20]
[Step 5/11] Resizing model to match image sizes and given batch
[ WARNING ] images: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ]     images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Network outputs:
[ INFO ]     output0 (node: output0) : f32 / [...] / [1,84,8400]
[ INFO ]     onnx::Reshape_421 (node: onnx::Reshape_421) : f32 / [...] / [1,144,80,80]
[ INFO ]     onnx::Reshape_436 (node: onnx::Reshape_436) : f32 / [...] / [1,144,40,40]
[ INFO ]     onnx::Reshape_451 (node: onnx::Reshape_451) : f32 / [...] / [1,144,20,20]
[Step 7/11] Loading the model to the device
error: MultiClusterStrategyAssignment Pass failed : Cannot get per cluster memory shapes. Unsupported distribution: #VPU.DistributedTensor<mode = <SEGMENTED>, num_tiles = [1, 1, 2, 1], num_clusters = 2 : i64, alignment = [1, 1, 4, 1]>
[ ERROR ] Exception from src/inference/src/cpp/core.cpp:106:
Exception from src/inference/src/dev/plugin.cpp:54:
Exception from src/plugins/intel_npu/src/plugin/src/plugin.cpp:513:
Check 'result == ZE_RESULT_SUCCESS' failed at src/plugins/intel_npu/src/compiler/src/zero_compiler_in_driver.cpp:745:
Failed to compile network. L0 createGraph result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe. Compilation failed
Failed to create executable

If with FP16 yolov8 model, it's OK.

attached the INT8 yolov8 model I use. my-yolo-v8n-int8.zip

joey5678 avatar May 22 '24 05:05 joey5678

Hi @joey5678 Is this issue still valid with the latest NPU driver?

AlinaMingiuc avatar Jul 04 '24 09:07 AlinaMingiuc

@joey5678 we've done a quick test with the provided model (my-yolo-v8n-int8.zip) on MTL with NPU (Intel Core Ultra 7 155H) and the issue is not observed. Please try using the latest OpenVINO version 2024.3 and the latest NPU driver and see if the issue is fixed on your end. Hope this helps.

$ benchmark_app -m yolo-v8n.xml -d NPU -t 5
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] NPU
[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(NPU) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 12.33 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output0 (node: output0) : f32 / [...] / [1,84,8400]
[ INFO ]     onnx::Reshape_421 (node: onnx::Reshape_421) : f32 / [...] / [1,144,80,80]
[ INFO ]     onnx::Reshape_436 (node: onnx::Reshape_436) : f32 / [...] / [1,144,40,40]
[ INFO ]     onnx::Reshape_451 (node: onnx::Reshape_451) : f32 / [...] / [1,144,20,20]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output0 (node: output0) : f32 / [...] / [1,84,8400]
[ INFO ]     onnx::Reshape_421 (node: onnx::Reshape_421) : f32 / [...] / [1,144,80,80]
[ INFO ]     onnx::Reshape_436 (node: onnx::Reshape_436) : f32 / [...] / [1,144,40,40]
[ INFO ]     onnx::Reshape_451 (node: onnx::Reshape_451) : f32 / [...] / [1,144,20,20]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 1406.23 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   DEVICE_ID:
[ INFO ]   ENABLE_CPU_PINNING: False
[ INFO ]   EXECUTION_DEVICES: NPU
[ INFO ]   EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float16'>
[ INFO ]   LOADED_FROM_CACHE: False
[ INFO ]   MODEL_PRIORITY: Priority.MEDIUM
[ INFO ]   NETWORK_NAME: main_graph
[ INFO ]   NPU_COMPILATION_MODE_PARAMS:
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ]   PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 1
[ INFO ]   PERF_COUNT: False
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 5000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 19.05 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:NPU
[ INFO ] Count:            564 iterations
[ INFO ] Duration:         5063.55 ms
[ INFO ] Latency:
[ INFO ]    Median:        35.56 ms
[ INFO ]    Average:       35.67 ms
[ INFO ]    Min:           16.93 ms
[ INFO ]    Max:           57.86 ms
[ INFO ] Throughput:   111.38 FPS

avitial avatar Aug 21 '24 20:08 avitial

Closing this as it seems issue has been addressed. Feel free to reopen and ask additional questions related to this topic.

avitial avatar Oct 09 '24 20:10 avitial