[Bug]: The inference result using CPU on MacOS M2 is abnormal, but the result using TEMPLATE device is normal
OpenVINO Version
2024.0.0
Operating System
macOS Systems for Apple Silicon
Device used for inference
CPU
Framework
ONNX
Model used
No response
Issue description
Platform: Mac M2
- Using AUTO:CPU devices for inference, there were no errors during the process, calculations are fast, but the results were all 0
- Using AUTO devices for inference, calculations are slow, but the results are normal
Why is using the AUTO:CPU as an inference device causing abnormal results, while using AUTO is normal?
**use AUTO devices log:**
[17:03:29.2394]D[plugin.cpp:247][AUTO] deviceNameWithID:CPU, defaultDeviceID:, uniqueName:CPU_
[17:03:29.2394]D[plugin.cpp:247][AUTO] deviceNameWithID:TEMPLATE, defaultDeviceID:0, uniqueName:TEMPLATE_0
[17:03:29.2408]I[plugin.cpp:421][AUTO] device:CPU, config:INFERENCE_NUM_THREADS=1
[17:03:29.2408]I[plugin.cpp:421][AUTO] device:CPU, config:LOG_LEVEL=LOG_TRACE
[17:03:29.2408]I[plugin.cpp:421][AUTO] device:CPU, config:PERFORMANCE_HINT=LATENCY
[17:03:29.2408]I[plugin.cpp:421][AUTO] device:CPU, config:PERFORMANCE_HINT_NUM_REQUESTS=0
[17:03:29.2408]I[plugin.cpp:421][AUTO] device:CPU, config:PERF_COUNT=NO
[17:03:29.2408]I[plugin.cpp:423][AUTO] device:CPU, priority:0
[17:03:29.2408]I[plugin.cpp:421][AUTO] device:TEMPLATE, config:LOG_LEVEL=LOG_TRACE
[17:03:29.2408]I[plugin.cpp:421][AUTO] device:TEMPLATE, config:PERFORMANCE_HINT=LATENCY
[17:03:29.2408]I[plugin.cpp:421][AUTO] device:TEMPLATE, config:PERFORMANCE_HINT_NUM_REQUESTS=0
[17:03:29.2408]I[plugin.cpp:421][AUTO] device:TEMPLATE, config:PERF_COUNT=NO
[17:03:29.2408]I[plugin.cpp:423][AUTO] device:TEMPLATE, priority:0
[17:03:29.2409]I[schedule.cpp:17][AUTO] scheduler starting
[17:03:29.2409]I[auto_schedule.cpp:131][AUTO] select device:TEMPLATE
[17:03:29.2409]I[auto_schedule.cpp:145][AUTO] will load CPU for accelerator
[17:03:29.2961]I[auto_schedule.cpp:109][AUTO] device:TEMPLATE compiling model finished
[17:03:29.2961]D[auto_schedule.cpp:118][AUTO] device:TEMPLATE, GetConfig:NETWORK_NAME=main_graph
[17:03:29.2961]D[auto_schedule.cpp:118][AUTO] device:TEMPLATE, GetConfig:SUPPORTED_PROPERTIES=NETWORK_NAME SUPPORTED_PROPERTIES EXECUTION_DEVICES LOADED_FROM_CACHE OPTIMAL_NUMBER_OF_INFER_REQUESTS DEVICE_ID PERF_COUNT
[17:03:29.2961]D[auto_schedule.cpp:118][AUTO] device:TEMPLATE, GetConfig:EXECUTION_DEVICES=TEMPLATE.0
[17:03:29.2961]D[auto_schedule.cpp:118][AUTO] device:TEMPLATE, GetConfig:LOADED_FROM_CACHE=NO
[17:03:29.2961]D[auto_schedule.cpp:118][AUTO] device:TEMPLATE, GetConfig:OPTIMAL_NUMBER_OF_INFER_REQUESTS=1
[17:03:29.2961]D[auto_schedule.cpp:118][AUTO] device:TEMPLATE, GetConfig:DEVICE_ID=0
[17:03:29.2962]D[auto_schedule.cpp:118][AUTO] device:TEMPLATE, GetConfig:PERF_COUNT=NO
[2024-05-13 17:03:29.297] [info] [main.cpp:13] model name: main_graph
[2024-05-13 17:03:29.297] [info] [main.cpp:13] inputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input name: ref_real
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input shape: [1,1,321]
[2024-05-13 17:03:29.297] [info] [main.cpp:13] inputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input name: ref_imag
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input shape: [1,1,321]
[2024-05-13 17:03:29.297] [info] [main.cpp:13] inputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input name: mic_real
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input shape: [1,1,321]
[2024-05-13 17:03:29.297] [info] [main.cpp:13] inputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input name: mic_imag
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input shape: [1,1,321]
[2024-05-13 17:03:29.297] [info] [main.cpp:13] inputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input name: in_state1
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input shape: [1,41,64]
[2024-05-13 17:03:29.297] [info] [main.cpp:13] inputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input name: in_state2
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] input shape: [1,41,64]
[2024-05-13 17:03:29.297] [info] [main.cpp:13] outputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output name: enhance_real
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output shape: [1,321,1]
[2024-05-13 17:03:29.297] [info] [main.cpp:13] outputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output name: enhance_imag
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output shape: [1,321,1]
[2024-05-13 17:03:29.297] [info] [main.cpp:13] outputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output name: out_state1
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output shape: [1,41,64]
[2024-05-13 17:03:29.297] [info] [main.cpp:13] outputs
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output name: out_state2
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output type: f32
[2024-05-13 17:03:29.297] [info] [main.cpp:13] output shape: [1,41,64]
[2024-05-13 17:03:29.298] [info] [main.cpp:19] Step 2
[2024-05-13 17:03:29.298] [info] [main.cpp:19] Step 3
[2024-05-13 17:03:29.298] [info] [main.cpp:19] Step 4
[2024-05-13 17:03:29.300] [info] [main.cpp:19] Step 6
[2024-05-13 17:03:29.301] [info] [aec_ans_test.cpp:102] start process ...
[17:03:29.3098]I[auto_schedule.cpp:109][AUTO] device:CPU compiling model finished
[17:03:29.3098]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:SUPPORTED_PROPERTIES=SUPPORTED_PROPERTIES NETWORK_NAME OPTIMAL_NUMBER_OF_INFER_REQUESTS NUM_STREAMS AFFINITY INFERENCE_NUM_THREADS PERF_COUNT INFERENCE_PRECISION_HINT PERFORMANCE_HINT EXECUTION_MODE_HINT PERFORMANCE_HINT_NUM_REQUEST
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:NETWORK_NAME=main_graph
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:OPTIMAL_NUMBER_OF_INFER_REQUESTS=1
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:NUM_STREAMS=1
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:AFFINITY=NONE
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:INFERENCE_NUM_THREADS=1
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:PERF_COUNT=NO
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:INFERENCE_PRECISION_HINT=f32
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:PERFORMANCE_HINT=LATENCY
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:EXECUTION_MODE_HINT=PERFORMANCE
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:PERFORMANCE_HINT_NUM_REQUESTS=0
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:ENABLE_CPU_PINNING=NO
[17:03:29.3109]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:SCHEDULING_CORE_TYPE=ANY_CORE
[17:03:29.3110]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:MODEL_DISTRIBUTION_POLICY=
[17:03:29.3110]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:ENABLE_HYPER_THREADING=NO
[17:03:29.3110]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:EXECUTION_DEVICES=CPU
[17:03:29.3110]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:CPU_DENORMALS_OPTIMIZATION=NO
[17:03:29.3110]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:LOG_LEVEL=LOG_TRACE
[17:03:29.3110]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE=1
[17:03:29.3110]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:DYNAMIC_QUANTIZATION_GROUP_SIZE=0
[17:03:29.3110]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:KV_CACHE_PRECISION=f16
[17:03:29.3110]I[auto_schedule.cpp:230][AUTO] release all work requests of CPU_HELP
[17:03:29.3122]I[auto_schedule.cpp:235][AUTO] helper released!!
**use AUTO:CPU as devices log:**
[17:08:58.7987]D[plugin.cpp:247][AUTO] deviceNameWithID:CPU, defaultDeviceID:, uniqueName:CPU_
[17:08:58.8007]I[plugin.cpp:421][AUTO] device:CPU, config:INFERENCE_NUM_THREADS=1
[17:08:58.8007]I[plugin.cpp:421][AUTO] device:CPU, config:LOG_LEVEL=LOG_TRACE
[17:08:58.8007]I[plugin.cpp:421][AUTO] device:CPU, config:PERFORMANCE_HINT=LATENCY
[17:08:58.8007]I[plugin.cpp:421][AUTO] device:CPU, config:PERFORMANCE_HINT_NUM_REQUESTS=0
[17:08:58.8007]I[plugin.cpp:421][AUTO] device:CPU, config:PERF_COUNT=NO
[17:08:58.8007]I[plugin.cpp:423][AUTO] device:CPU, priority:0
[17:08:58.8007]I[schedule.cpp:17][AUTO] scheduler starting
[17:08:58.8007]I[auto_schedule.cpp:131][AUTO] select device:CPU
[17:08:58.8673]I[auto_schedule.cpp:109][AUTO] device:CPU compiling model finished
[17:08:58.8673]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:SUPPORTED_PROPERTIES=SUPPORTED_PROPERTIES NETWORK_NAME OPTIMAL_NUMBER_OF_INFER_REQUESTS NUM_STREAMS AFFINITY INFERENCE_NUM_THREADS PERF_COUNT INFERENCE_PRECISION_HINT PERFORMANCE_HINT EXECUTION_MODE_HINT PERFORMANCE_HINT_NUM_REQUEST
[17:08:58.8683]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:NETWORK_NAME=main_graph
[17:08:58.8683]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:OPTIMAL_NUMBER_OF_INFER_REQUESTS=1
[17:08:58.8683]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:NUM_STREAMS=1
[17:08:58.8683]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:AFFINITY=NONE
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:INFERENCE_NUM_THREADS=1
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:PERF_COUNT=NO
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:INFERENCE_PRECISION_HINT=f32
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:PERFORMANCE_HINT=LATENCY
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:EXECUTION_MODE_HINT=PERFORMANCE
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:PERFORMANCE_HINT_NUM_REQUESTS=0
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:ENABLE_CPU_PINNING=NO
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:SCHEDULING_CORE_TYPE=ANY_CORE
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:MODEL_DISTRIBUTION_POLICY=
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:ENABLE_HYPER_THREADING=NO
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:EXECUTION_DEVICES=CPU
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:CPU_DENORMALS_OPTIMIZATION=NO
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:LOG_LEVEL=LOG_TRACE
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE=1
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:DYNAMIC_QUANTIZATION_GROUP_SIZE=0
[17:08:58.8684]D[auto_schedule.cpp:118][AUTO] device:CPU, GetConfig:KV_CACHE_PRECISION=f16
[17:08:58.8684]I[plugin.cpp:451][AUTO] underlying hardware does not support hardware context
[2024-05-13 17:08:58.869] [info] [main.cpp:13] model name: main_graph
[2024-05-13 17:08:58.869] [info] [main.cpp:13] inputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input name: ref_real
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input shape: [1,1,321]
[2024-05-13 17:08:58.869] [info] [main.cpp:13] inputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input name: ref_imag
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input shape: [1,1,321]
[2024-05-13 17:08:58.869] [info] [main.cpp:13] inputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input name: mic_real
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input shape: [1,1,321]
[2024-05-13 17:08:58.869] [info] [main.cpp:13] inputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input name: mic_imag
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input shape: [1,1,321]
[2024-05-13 17:08:58.869] [info] [main.cpp:13] inputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input name: in_state1
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input shape: [1,41,64]
[2024-05-13 17:08:58.869] [info] [main.cpp:13] inputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input name: in_state2
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] input shape: [1,41,64]
[2024-05-13 17:08:58.869] [info] [main.cpp:13] outputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output name: enhance_real
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output shape: [1,321,1]
[2024-05-13 17:08:58.869] [info] [main.cpp:13] outputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output name: enhance_imag
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output shape: [1,321,1]
[2024-05-13 17:08:58.869] [info] [main.cpp:13] outputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output name: out_state1
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output shape: [1,41,64]
[2024-05-13 17:08:58.869] [info] [main.cpp:13] outputs
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output name: out_state2
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output type: f32
[2024-05-13 17:08:58.869] [info] [main.cpp:13] output shape: [1,41,64]```
### Step-by-step reproduction
_No response_
### Relevant log output
_No response_
### Issue submission checklist
- [X] I'm reporting an issue. It's not a question.
- [X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [X] There is reproducer code and related data files such as images, videos, models, etc.
After testing, if we output the result of the last BN layer of the model, the final result is normal。
Hi @wanglxchina , Do you still need help with this? If so, please let us know what model you use and share the link to your model if it's available publicly.
Thank you for your reply. Unfortunately, the current model cannot be provided at the moment. This seems to be a bug in OpenVino. There was an error executing the BatchNorm2d operator on the arm64 platform. When exporting the BatchNorm2d execution result, the result was correct, but if not exported, the result is incorrect. There is no such issue on the X86_64 platform, everything is normal on the X86_64 platform; Additionally, we will attempt a simple model with a similar structure for testing. If more information is needed, I can provide it. Thank you.
@wanglxchina can you help a quick check of the result by using device directly as CPU (not AUTO:CPU)? what is the result on arm64 for CPU without reporting the BatchNorm2d execution result?
@songbell Using CPU or AUTO:CPU results are both incorrect, and I have already tested them. The first thing I used was to directly use the CPU.If the execution results of BatchNorm2d are exported, both CPU and AUTO:CPU results are correct
@wanglxchina The difference is due to the TEMPLATE and CPU device. Will ask CPU engineer to take a look.
@allnes Please take a look on the issue.
@wanglxchina We would appreciate if you can provide part of the model (or equivalent subgraph) that produces incorrect results. It will significantly simpify reproduction work on our side.
@dmitry-gorokhov this is a simple onnx model with a similar structure for testing. toy_model_multi_out.onnx export the execution results of BatchNorm2d and toy_model.onnx was not exported.
The results obtained by toy_model.onnx and toy_model_multi_out.onnx using CPU inference on the arm64 platform are different. But they should be the same.
The results obtained by toy_model.onnx using CPU inference on the arm64 platform is the same as the results obtained by toy_model.onnx using CPU inference on the x86_64 platform. This is in line with expectations.
@dmitry-gorokhov this is a simple onnx model with a similar structure for testing.
toy_model_multi_out.onnxexport the execution results of BatchNorm2d andtoy_model.onnxwas not exported.The results obtained by
toy_model.onnxandtoy_model_multi_out.onnxusing CPU inference on the arm64 platform are different. But they should be the same.The results obtained by
toy_model.onnxusing CPU inference on the arm64 platform is the same as the results obtained bytoy_model.onnxusing CPU inference on the x86_64 platform. This is in line with expectations.
Hi! Thanks for models. I will return when I get some results about issue.
@allnes Hello, has there been any progress on this issue
@wanglxchina Hello, I keep debugging your issue, could you please tell me if you build OpenVINO or take a specific package?
@wanglxchina Hello, I keep debugging your issue, could you please tell me if you build OpenVINO or take a specific package?
OpenVINO2024.0.0 build by myself and OpenVINO2024.1.0 download from official website both have been tried. Both have the same problem.
@wanglxchina Hello, I keep debugging your issue, could you please tell me if you build OpenVINO or take a specific package?
OpenVINO2024.0.0 build by myself and OpenVINO2024.1.0 download from official website both have been tried. Both have the same problem.
If it possible could you provide dumb blobs, you can get it with help this instruction in debug mode for building? I will try to compare yours blobs and my getting blobs.
@wanglxchina Hello, I keep debugging your issue, could you please tell me if you build OpenVINO or take a specific package?
OpenVINO2024.0.0 build by myself and OpenVINO2024.1.0 download from official website both have been tried. Both have the same problem.
If it possible could you provide dumb blobs, you can get it with help this instruction in debug mode for building? I will try to compare yours blobs and my getting blobs.
@wanglxchina Hello, I apologize for the misinformation. Dumping blobs is not required, you don't have to do it, I am in the process of debugging this bug. Thank you for your patience.
@wanglxchina Hello! According to the latest data on the current version of the master branch the results for your networks (toy_model.zip) match, we ask you to check if it works for you on the latest version of the master branch OpenVINO
Closing issue, feel free to re-open or start a new issue if additional assistance is needed.