oneDNN
oneDNN copied to clipboard
batchnorm requires consistent in- and output mem format_tags
Summary
Provide a short summary of the issue. Sections below provide guidance on what factors are considered important to reproduce an issue.
Version
3.3.0
Environment
VS2019
Steps to reproduce
setup descriptors for convolution followed by batchnorm
Observed behavior
instantiating dnnl::batch_normalization_forward::primitive_desc throws dnnl_unimplemented from https://github.com/oneapi-src/oneDNN/blob/25596d25116d3fd523f1ac5e32e44cb5e8295a9e/src/common/primitive_desc_iface.cpp#L77
This is likely due to the format_tag::any of the output mem descriptor of the conv according to https://oneapi-src.github.io/oneDNN/group_dnnl_api_convolution.html#doxid-group-dnnl-api-convolution
Memory descriptors can be initialized with dnnl_format_tag_any or with format_kind set to dnnl_format_kind_any
Expected behavior
- ideally batchnorm would flexibly work with arbitrary combination of in- and output formats
- otherwise it would be helpful if the exception would be more verbose
- at the very least the documentation should point out this limitation, since this issue can actually be worked around by converting the input memory descriptor to enforce consistent format tags
Hi @IngmarVoigt2, have you tried running ONEDNN_VERBOSE=all? if so could you also please share the output.
Additional information such as a code snippet of your implementation would be helpful.
Please also refer to the implementation limitations if you haven't already: https://oneapi-src.github.io/oneDNN/dev_guide_batch_normalization.html#implementation-limitations
thanks for the quick followup @yehudaorel ! sorry, I didn't see your message back then
The verbose logs are
onednn_verbose,info,oneDNN v3.3.0 (commit N/A)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:12
onednn_verbose,info,cpu,isa:Intel AVX2
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time
onednn_verbose,primitive,create:cache_miss,cpu,convolution,brgconv:avx2,forward_inference,src_f32:a:blocked:acdb::f0 wei_f32:a:blocked:Acdb16a::f0 bia_f32:a:blocked:a::f0 dst_f32:a:blocked:acdb::f0,,alg:convolution_direct,mb3_ic4oc32_ih16oh16kh3sh1dh0ph1_iw32ow32kw3sw1dw0pw1,5.819
onednn_verbose,primitive,create:cache_miss,cpu,eltwise,jit:avx2,forward_inference,data_f32::blocked:acdb::f0 diff_undef::undef:::,,alg:eltwise_relu alpha:0 beta:0,3x32x16x32,2.9757
Sharing this code is a bit tricky, since the different parts are integrated into a different framework, but basically
m_mkldnn_prim_desc = std::shared_ptr<dnnl::batch_normalization_forward::primitive_desc>(new dnnl::batch_normalization_forward::primitive_desc(*m_mkldnn_engine, dnnl::prop_kind::forward_inference, src_d, out_d, m_epsilon, flags));
is where it ultimately fails. src_d is the output descriptor from a convolution layer (and activation as you may be able to tell from the verbose logs above).
Originally I was able to work around by enforcing a different input descriptor mem format, but that does not seem to work well for me in all situations.
Any ideas based on the logs?
Also thanks for pointing me to the documentation, but as far as I could see none of these should matter in my case. Interestingly this is not an issue when using in-place batch norm operations, but unfortunately I cannot enforce this consistently across all components (unless I would ultimately copying around data as a workaround?)
Nevermind, I actually just solved it using
auto dst_d = dnnl::memory::desc(outShape, dnnl::memory::data_type::f32, dnnl::memory::format_tag::any);
Maybe you could add this to the docs?