openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[Bug]: ConvTranspose2d is followed by BatchNorm2d and in_channels=out_channels=groups error, Intel Xeon [email protected] centos linux 7, openvino 2024.0

Open xulingling516 opened this issue 11 months ago • 6 comments

OpenVINO Version

2024.0

Operating System

Other (Please specify in description)

Device used for inference

CPU

Framework

PyTorch

Model used

mobilenet

Issue description

ConvTranspose2d is followed by BatchNorm2d and in_channels=out_channels=groups, executing inference error on Intel Xeon [email protected], openvino version is 2024.0, the error is: Segmentation fault (core dumped), os: centos linux 7, the problem was mentioned in issues/22343, I download openvino2024.0 and installed, but it was not resolved. Attached is the problem code. crash2.zip

Step-by-step reproduction

No response

Relevant log output

No response

Issue submission checklist

  • [X] I'm reporting an issue. It's not a question.
  • [X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • [X] There is reproducer code and related data files such as images, videos, models, etc.

xulingling516 avatar Mar 12 '24 02:03 xulingling516

@xulingling516 I could not download the crash log. Could you help re-upload? Meanwhile, please provide the URL for the model being used. Thanks!

wenjiew avatar Mar 13 '24 05:03 wenjiew

crash2.zip @wenjiew, I have re-uploaded it. The model is in https://github.com/openvinotoolkit/openvino/files/14632924/crash2.zip Thanks!

xulingling516 avatar Mar 18 '24 08:03 xulingling516

@xulingling516 I've done a quick test on Intel(R) Xeon(R) Platinum 8480+ with your model files and app and there is no segmentation fault observed. Issue cannot be reproduced, please make sure you are using the latest 2024.0 version.

$ python openvino_test2.py
output=(1, 128, 96, 96)

$ benchmark_app -m dconv.onnx -d CPU -t 10
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.0.0-14509-34caeefd078-releases/2024/0
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2024.0.0-14509-34caeefd078-releases/2024/0
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 3.19 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     data (node: data) : f32 / [...] / [1,3,96,96]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,128,96,96]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     data (node: data) : u8 / [N,C,H,W] / [1,3,96,96]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,128,96,96]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 334.47 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: torch_jit
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 112
[ INFO ]   NUM_STREAMS: 112
[ INFO ]   AFFINITY: Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS: 112
[ INFO ]   PERF_COUNT: NO
[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'bfloat16'>
[ INFO ]   PERFORMANCE_HINT: THROUGHPUT
[ INFO ]   EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]   ENABLE_CPU_PINNING: True
[ INFO ]   SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ]   ENABLE_HYPER_THREADING: False
[ INFO ]   EXECUTION_DEVICES: ['CPU']
[ INFO ]   CPU_DENORMALS_OPTIMIZATION: False
[ INFO ]   LOG_LEVEL: Level.NO
[ INFO ]   CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[ INFO ]   DYNAMIC_QUANTIZATION_GROUP_SIZE: 0
[ INFO ]   KV_CACHE_PRECISION: <Type: 'float16'>
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'data'!. This input will be filled with random values!
[ INFO ] Fill input 'data' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 112 inference requests, limits: 10000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 9.61 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:['CPU']
[ INFO ] Count:            155344 iterations
[ INFO ] Duration:         10009.15 ms
[ INFO ] Latency:
[ INFO ]    Median:        6.92 ms
[ INFO ]    Average:       6.90 ms
[ INFO ]    Min:           2.96 ms
[ INFO ]    Max:           142.38 ms
[ INFO ] Throughput:   15520.20 FPS

avitial avatar Apr 08 '24 19:04 avitial

We tried several cpus, the successful CPU is: Intel Xeon E5-2640(v4) Intel Xeon E5-2623(v4) the crash CPU is: Intel Xeon Gold-5318Y(IceLake) Intel Xeon Gold-5118(SkyLake) Can you try the IceLake or SkyLake CPU? ---- Replied Message ---- | From | Luis @.> | | Date | 4/9/2024 03:59 | | To | @.> | | Cc | @.> , @.> | | Subject | Re: [openvinotoolkit/openvino] [Bug]: ConvTranspose2d is followed by BatchNorm2d and in_channels=out_channels=groups error, Intel Xeon @.*** centos linux 7, openvino 2024.0 (Issue #23390) |

@xulingling516 I've done a quick test on Intel(R) Xeon(R) Platinum 8480+ with your model files and app and there is no segmentation fault observed. Issue cannot be reproduced, please make sure you are using the latest 2024.0 version.

$ python openvino_test2.py output=(1, 128, 96, 96)

$ benchmark_app -m dconv.onnx -d CPU -t 10 [Step 1/11] Parsing and validating input arguments [ INFO ] Parsing input parameters [Step 2/11] Loading OpenVINO Runtime [ INFO ] OpenVINO: [ INFO ] Build ................................. 2024.0.0-14509-34caeefd078-releases/2024/0 [ INFO ] [ INFO ] Device info: [ INFO ] CPU [ INFO ] Build ................................. 2024.0.0-14509-34caeefd078-releases/2024/0 [ INFO ] [ INFO ] [Step 3/11] Setting device configuration [ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.THROUGHPUT. [Step 4/11] Reading model files [ INFO ] Loading model files [ INFO ] Read model took 3.19 ms [ INFO ] Original model I/O parameters: [ INFO ] Model inputs: [ INFO ] data (node: data) : f32 / [...] / [1,3,96,96] [ INFO ] Model outputs: [ INFO ] output (node: output) : f32 / [...] / [1,128,96,96] [Step 5/11] Resizing model to match image sizes and given batch [ INFO ] Model batch size: 1 [Step 6/11] Configuring input of the model [ INFO ] Model inputs: [ INFO ] data (node: data) : u8 / [N,C,H,W] / [1,3,96,96] [ INFO ] Model outputs: [ INFO ] output (node: output) : f32 / [...] / [1,128,96,96] [Step 7/11] Loading the model to the device [ INFO ] Compile model took 334.47 ms [Step 8/11] Querying optimal runtime parameters [ INFO ] Model: [ INFO ] NETWORK_NAME: torch_jit [ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 112 [ INFO ] NUM_STREAMS: 112 [ INFO ] AFFINITY: Affinity.CORE [ INFO ] INFERENCE_NUM_THREADS: 112 [ INFO ] PERF_COUNT: NO [ INFO ] INFERENCE_PRECISION_HINT: <Type: 'bfloat16'> [ INFO ] PERFORMANCE_HINT: THROUGHPUT [ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE [ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0 [ INFO ] ENABLE_CPU_PINNING: True [ INFO ] SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE [ INFO ] ENABLE_HYPER_THREADING: False [ INFO ] EXECUTION_DEVICES: ['CPU'] [ INFO ] CPU_DENORMALS_OPTIMIZATION: False [ INFO ] LOG_LEVEL: Level.NO [ INFO ] CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0 [ INFO ] DYNAMIC_QUANTIZATION_GROUP_SIZE: 0 [ INFO ] KV_CACHE_PRECISION: <Type: 'float16'> [Step 9/11] Creating infer requests and preparing input tensors [ WARNING ] No input files were given for input 'data'!. This input will be filled with random values! [ INFO ] Fill input 'data' with random values [Step 10/11] Measuring performance (Start inference asynchronously, 112 inference requests, limits: 10000 ms duration) [ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop). [ INFO ] First inference took 9.61 ms [Step 11/11] Dumping statistics report [ INFO ] Execution Devices:['CPU'] [ INFO ] Count: 155344 iterations [ INFO ] Duration: 10009.15 ms [ INFO ] Latency: [ INFO ] Median: 6.92 ms [ INFO ] Average: 6.90 ms [ INFO ] Min: 2.96 ms [ INFO ] Max: 142.38 ms [ INFO ] Throughput: 15520.20 FPS

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

xulingling516 avatar Apr 09 '24 01:04 xulingling516

@xulingling516 We don't have this CPU in our list of hosts. Could you send the output these two commands such as we can see whether we can find something similar: $ lscpu $ lscpu -e

wenjiew avatar Apr 12 '24 04:04 wenjiew

lscpu -e CPU NODE S0CKET CORE L1d:L1i:L2:L3 ONLINE 0 0 0 0 0:0:0:0 yes 1 0 0 1 1:1:1:0 yes 2 0 0 2 2:2:2:0 yes 3 0 0 3 3:3:3:0 yes 4 0 1 4 4:4:4:1 yes 5 0 1 5 5:5:5:1 yes 6 0 1 6 6:6:6:1 yes 7 0 1 7 7:7:7:1 yes 8 0 2 8 8:8:8:2 yes 9 0 2 9 9:9:9:2 yes 10 0 2 10 10:10:10:2 yes 11 0 2 11 11:11:11:2 yes 12 0 3 12 12:12:12:3 yes 13 0 3 13 13:13:13:3 yes 14 0 3 14 14:14:14:3 yes 15 0 3 15 15:15:15:3 yes

lscpu Architecture: X86 64 CPU op-mode(s):32-bit,64-bit Byte Order: Little Endian CPU(s): 16 On-line CPu(s) list: 0-15 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 134 Model name: Intel Xeon Processor (Icelake) stepping: 0 CPU MHz: 2095.076 BogoMIPS: 4190.15 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full Lld cache: 32K Lli cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA nodeO CPU(s): 0-15

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant rep_good nopl xtopology eaterfpu pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervison lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmil avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap avx512ifma clfushhopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq md_clear spec_ctrl intel_stibp arch_capabilities

thanks!

---- Replied Message ---- | From | Wenjie @.> | | Date | 4/12/2024 12:25 | | To | @.> | | Cc | @.> , @.> | | Subject | Re: [openvinotoolkit/openvino] [Bug]: ConvTranspose2d is followed by BatchNorm2d and in_channels=out_channels=groups error, Intel Xeon @.*** centos linux 7, openvino 2024.0 (Issue #23390) |

@xulingling516 We don't have this CPU in our list of hosts. Could you send the output these two commands such as we can see whether we can find something similar: $ lscpu $ lscpu -e

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

xulingling516 avatar Apr 30 '24 15:04 xulingling516

We tried several cpus, the successful CPU is: Intel Xeon E5-2640(v4) Intel Xeon E5-2623(v4) the crash CPU is: Intel Xeon Gold-5318Y(IceLake) Intel Xeon Gold-5118(SkyLake) Can you try the IceLake or SkyLake CPU?

Just tried with a Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake platform) and latest OpenVINO 2024.2, and still cannot reproduce the issue. Please try with latest OpenVINO version 2024.2. Closing this as it can't be reproduced. Feel free to reopen and ask additional questions related to this topic if the issue persists.

$ python openvino_test2.py
output=(1, 128, 96, 96)

$ benchmark_app -m dconv.onnx -d CPU -t 5
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.2.0-15519-5c0f38f83f6-releases/2024/2
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2024.2.0-15519-5c0f38f83f6-releases/2024/2
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 2.07 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     data (node: data) : f32 / [...] / [1,3,96,96]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,128,96,96]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     data (node: data) : u8 / [N,C,H,W] / [1,3,96,96]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,128,96,96]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 65.62 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: torch_jit
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 64
[ INFO ]   NUM_STREAMS: 64
[ INFO ]   INFERENCE_NUM_THREADS: 64
[ INFO ]   PERF_COUNT: NO
[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT: THROUGHPUT
[ INFO ]   EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]   ENABLE_CPU_PINNING: True
[ INFO ]   SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ]   MODEL_DISTRIBUTION_POLICY: set()
[ INFO ]   ENABLE_HYPER_THREADING: False
[ INFO ]   EXECUTION_DEVICES: ['CPU']
[ INFO ]   CPU_DENORMALS_OPTIMIZATION: False
[ INFO ]   LOG_LEVEL: Level.NO
[ INFO ]   CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[ INFO ]   DYNAMIC_QUANTIZATION_GROUP_SIZE: 0
[ INFO ]   KV_CACHE_PRECISION: <Type: 'float16'>
[ INFO ]   AFFINITY: Affinity.CORE
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'data'!. This input will be filled with random values!
[ INFO ] Fill input 'data' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 64 inference requests, limits: 5000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 9.90 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:['CPU']
[ INFO ] Count:            23296 iterations
[ INFO ] Duration:         5016.35 ms
[ INFO ] Latency:
[ INFO ]    Median:        12.90 ms
[ INFO ]    Average:       13.72 ms
[ INFO ]    Min:           8.92 ms
[ INFO ]    Max:           22.24 ms
[ INFO ] Throughput:   4644.01 FPS

avitial avatar Jun 21 '24 21:06 avitial