openvino
openvino copied to clipboard
[CPU] brgconv on AVX2 enabling and switch OV deconv implement to ONEDNN deconv OP from conv_backward_data
Details:
- Re-enable conv with brgemm implement on avx2 platform with fp32 , also heuristic cause by wavernn-upsampler regression
- Switch OV cpu deconv implement from ONEDNN convbackward_data primitive to deconv primitive. Support const and non-const weight; Always expose planar weight layout to cpu graph; weight cache to save blocked const weight ; runtime reorder for none-const weight
- enable bias fusing for fp32 to replace legacy depthwise fusing
- enable deconv brgconv for fp32 and INT8 deconv
- update test case
About test case/WA update:
- OV has feature to set the output shape of conv_backward_data. Which would cause pad_r < 0 && pad_r + dil < 0, so applied the WA in onednn to create deconv OP
-
Deconv only has one ref entry(CPU_INSTANCE(ref_deconvolution_fwd_t) for JIT,gemm and c++ reference implement. So one layout can't use RTinfo to filter implement. Update test cases with 1x1 related or SSE related.
For example: backward data has 2 entries when input is nChw16C :
jit_avx512_common_1x1_convolution_bwd_data_f32_t
jit_avx512_common_convolution_bwd_data_t
to support So OV filter can work to get desired implement type. However, deconv only has only, it means that nChw16C would only get one implement type. So filter would not work. - Change stride to 1 unit because of brgemm limitation
- subgraph concat deconv test can't hit ref implement because the gemm and ref share one entry implement.
-
Conformance test conv related failed, but can pass when merging 3.3 with brg. Failed caused by
. Seems threshold changes. Also AMX and avv512 core would fail on these conformance test but not enabled on Azure platforms. Guess the expected result is based on avx2 jit. So current just remove these test
Tickets:
- CVS-133120, CVS-122688
ONEDNN PR, https://github.com/openvinotoolkit/oneDNN/pull/244
@maxnick , can you pls help to review the code. Thx!
Deconv only has one ref entry(CPU_INSTANCE(ref_deconvolution_fwd_t) for JIT,gemm and c++ reference implement. So one layout can't use RTinfo to filter implement. Update test cases with 1x1 related or SSE related. For example: backward data has 2 entries when input is nChw16C : jit_avx512_common_1x1_convolution_bwd_data_f32_t jit_avx512_common_convolution_bwd_data_t to support So OV filter can work to get desired implement type. However, deconv only has only, it means that nChw16C would only get one implement type. So filter would not work.
That means that we have to focus mostly on the layout and on the implementation type only when we can das distinguish. For example: For nChw16c set CPUSpecificParams{{nChw16c}, {nChw16c}, {}, ""}, so that the implementation type isn't enforced but selected by oneDNN. The same is true for nChw8c and the plain layout. We shouldn't remove ref impl with the plain layout, as this may be required on other platforms (ARM). Also, obviously we have to add brgemm dedicated deconvolution tests and put nChw16c and nChw8c cases to the nightly scope for avx512 and avx2 convolutions. For deconvolutions we also have to put nChw16c and nChw8c cases to the nightly scope as well, but in that case sse4 will be put to nightly as well as now it's impossible to enforce sse4 and avx deconvolution implementations. As for 1x1 deconvolution, suggest reverting 1x1 filter size, but again, remove the implementation specification (i.e. "avx512_1x1" "avx2_1x1" etc.) for non brgemm scenarious.
Change stride to 1 unit because of brgemm limitation
Probably should be set only for brgemm cases.
subgraph concat deconv test can't hit ref implement because the gemm and ref share one entry implement.
The same as above, simply remove the implementation type from config instead of removing the planar layout at all.
Deconv only has one ref entry(CPU_INSTANCE(ref_deconvolution_fwd_t) for JIT,gemm and c++ reference implement. So one layout can't use RTinfo to filter implement. Update test cases with 1x1 related or SSE related. For example: backward data has 2 entries when input is nChw16C : jit_avx512_common_1x1_convolution_bwd_data_f32_t jit_avx512_common_convolution_bwd_data_t to support So OV filter can work to get desired implement type. However, deconv only has only, it means that nChw16C would only get one implement type. So filter would not work.
That means that we have to focus mostly on the layout and on the implementation type only when we can das distinguish. For example: For nChw16c set CPUSpecificParams{{nChw16c}, {nChw16c}, {}, ""}, so that the implementation type isn't enforced but selected by oneDNN. The same is true for nChw8c and the plain layout. We shouldn't remove ref impl with the plain layout, as this may be required on other platforms (ARM). Also, obviously we have to add brgemm dedicated deconvolution tests and put nChw16c and nChw8c cases to the nightly scope for avx512 and avx2 convolutions. For deconvolutions we also have to put nChw16c and nChw8c cases to the nightly scope as well, but in that case sse4 will be put to nightly as well as now it's impossible to enforce sse4 and avx deconvolution implementations. As for 1x1 deconvolution, suggest reverting 1x1 filter size, but again, remove the implementation specification (i.e. "avx512_1x1" "avx2_1x1" etc.) for non brgemm scenarious.
Will apply in the update. Pls help to review other changes. Thx!
@maxnick , Applied review comments on test cases. pls help to review. Thx!
Change stride to 1 unit because of brgemm limitation
What happens if the stride is different from 1?
Change stride to 1 unit because of brgemm limitation.
What exactly happens if the stride is other than 1?
Change stride to 1 unit because of brgemm limitation.
What exactly happens if the stride is other than 1?
stride 1 would cause some cases fall back on jit with nhwc layout.
@maxnick , Applied review comments.
https://jira.devtools.intel.com/browse/CVS-140606 to track the acc issue on 1x1 kernel with details.
@maxnick , Applied review comment. Pls help to review. Thx!
PR is waiting for the internal validation results.
PR is ready to be merged.