oneDNN How to query for memory descriptors for weights_layer_md and weights_iter

Hello there are examples in C++ that query for memory descriptors for weights_layer and weights_iter in RNN LSTM primitive, that are:

lstm_pd.weights_desc()
lstm_pd.weights_iter_desc() (https://uxlfoundation.github.io/oneDNN/page_lstm_example_cpp.html#doxid-lstm-example-cpp)

But how to do it in plain C implementation? There is C API function: dnnl_primitive_desc_query_md(primitive_desc, what, index), where argument "what" is of dnnl_query_t type. But dnnl_query_t type has only some options there, one of them is dnnl_query_weights_md but I do not see there another option which will distinguish weights_layer and weights_iter.

Can I use?:

dnnl_primitive_desc_query_md(primitive_desc, dnnl_query_exec_arg_md, DNNL_ARG_WEIGHTS_LAYER)
dnnl_primitive_desc_query_md(primitive_desc, dnnl_query_exec_arg_md, DNNL_ARG_WEIGHTS_ITER) or:
dnnl_primitive_desc_query_md(primitive_desc, dnnl_query_weights_md, DNNL_ARG_WEIGHTS_LAYER)
dnnl_primitive_desc_query_md(primitive_desc, dnnl_query_weights_md, DNNL_ARG_WEIGHTS_ITER)

And will be memory descriptors for:

DNNL_ARG_WEIGHTS_LAYER and DNNL_ARG_DIFF_WEIGHTS_LAYER the same?
DNNL_ARG_WEIGHTS_ITER and DNNL_ARG_DIFF_WEIGHTS_ITER the same?

Thanks

Oct 27 '25 08:10 lacak-sk

Hello @lacak-sk, great questions!

But how to do it in plain C implementation?

You're right that dnnl_query_weights_md is a generic query and doesn't distinguish between weights_layer and weights_iter. It's typically used for primitives with a single weights argument (e.g., convolution). For RNN primitives like LSTM, which have multiple weights tensors, you should use dnnl_query_exec_arg_md with the appropriate argument index:

const_dnnl_memory_desc_t weights_layer_md = 
    dnnl_primitive_desc_query_md(lstm_pd, dnnl_query_exec_arg_md, DNNL_ARG_WEIGHTS_LAYER);

const_dnnl_memory_desc_t weights_iter_md = 
    dnnl_primitive_desc_query_md(lstm_pd, dnnl_query_exec_arg_md, DNNL_ARG_WEIGHTS_ITER);

And will be memory descriptors for: DNNL_ARG_WEIGHTS_LAYER and DNNL_ARG_DIFF_WEIGHTS_LAYER the same? DNNL_ARG_WEIGHTS_ITER and DNNL_ARG_DIFF_WEIGHTS_ITER the same?

In general, yes, they often have the same shape and layout. However, this is not guaranteed. Forward weights may use data types like f32, bf16, or quantized formats, while gradient (diff) weights typically use accumulation-friendly types like f32 for numerical stability. So, it's best to query them separately to ensure correctness:

const_dnnl_memory_desc_t diff_weights_layer_md = 
    dnnl_primitive_desc_query_md(lstm_bwd_pd, dnnl_query_exec_arg_md, DNNL_ARG_DIFF_WEIGHTS_LAYER);
    
const_dnnl_memory_desc_t diff_weights_iter_md = 
    dnnl_primitive_desc_query_md(lstm_bwd_pd, dnnl_query_exec_arg_md, DNNL_ARG_DIFF_WEIGHTS_ITER);

Nov 04 '25 05:11 shu1chen

Thanks.

I am asking because I want to do element wise update of both weights using: weights+=learning_rate*diff_weights So I need to ensure, that memory layout (dimensions, layout (format), element type) of "weights" and "diff_weights" is the same (for "weights_layer" and "diff_weights_layer" and also for "weights_iter" and "diff_weights_iter"). I create primitives (forward and backward) using memory descriptors created like: dnnl_memory_desc_create_with_tag(weights_layer_md, 5, weights_layer_sizes, dnnl_f32, dnnl_format_tag_any);

Can I use dnnl_memory_desc_equal() to compare memory descriptors returned by:

dnnl_primitive_desc_query_md(lstm_pd, dnnl_query_exec_arg_md, DNNL_ARG_WEIGHTS_LAYER); and
dnnl_primitive_desc_query_md(lstm_bwd_pd, dnnl_query_exec_arg_md, DNNL_ARG_DIFF_WEIGHTS_LAYER); To ensure that both have the same dimensions, layout (format), element type? (documentation is IMHO bit unclear what is compared by dnnl_memory_desc_equal ...)

Nov 04 '25 11:11 lacak-sk

Yes, dnnl_memory_desc_equal() is the right function for your use case. When using dnnl_format_tag_any, oneDNN independently optimizes layouts for forward and backward primitives. This means:

The forward and backward primitives may choose different layouts for the same logical tensor. While they often match, there's no guarantee, especially for performance-optimized implementations that may use different blocking strategies.
To ensure compatibility, you should always query the actual memory descriptors chosen by the primitives and compare them:

const_dnnl_memory_desc_t fwd_weights_layer_md = 
    dnnl_primitive_desc_query_md(lstm_fwd_pd, dnnl_query_exec_arg_md, 
                                  DNNL_ARG_WEIGHTS_LAYER);
const_dnnl_memory_desc_t bwd_diff_weights_layer_md = 
    dnnl_primitive_desc_query_md(lstm_bwd_pd, dnnl_query_exec_arg_md, 
                                  DNNL_ARG_DIFF_WEIGHTS_LAYER);

int layouts_match = dnnl_memory_desc_equal(fwd_weights_layer_md, 
                                            bwd_diff_weights_layer_md);

if (layouts_match) {
    // weights += learning_rate * diff_weights is safe
} else {
    // Reorder diff_weights to match weights layout before update
}

Nov 05 '25 05:11 shu1chen

Thanks. I did experiments using

fw_weights_layer_md = dnnl_primitive_desc_query_md(lstm_fw_pd, dnnl_query_exec_arg_md, DNNL_ARG_WEIGHTS_LAYER);
bw_weights_layer_md = dnnl_primitive_desc_query_md(lstm_bw_pd, dnnl_query_exec_arg_md, DNNL_ARG_WEIGHTS_LAYER);
bw_diff_weights_layer_md = dnnl_primitive_desc_query_md(lstm_bw_pd, dnnl_query_exec_arg_md, DNNL_ARG_DIFF_WEIGHTS_LAYER);

And found strange thing:

dnnl_memory_desc_equal(fw_weights_layer_md, bw_diff_weights_layer_md) == 1 BUT
dnnl_memory_desc_equal(fw_weights_layer_md, bw_weights_layer_md) != 1

So when I query forward primitive for WEIGHTS_LAYER I get memory descriptor which does NOT match memory descriptor returned for backward primitive (same apply for WEIGHTS_ITER)

Next I query memory descriptors for informations using (Pascal syntax):

ndims: dnnl_memory_desc_query(fw_weights_layer_md, dnnl_query_ndims_s32, @i);
dims: dnnl_memory_desc_query(fw_weights_layer_md, dnnl_query_dims, @p);
padded_dims: dnnl_memory_desc_query(fw_weights_layer_md, dnnl_query_padded_dims, @p);
size: dnnl_memory_desc_get_size(fw_weights_layer_md);

Outputs are:

fw_weights_layer: ndims=5, dims=1x2x53x4x256, size=440960
bw_weights_layer: ndims=5, dims=1x2x53x4x256, size=524288
bw_diff_weights_layer: ndims=5, dims=1x2x53x4x256, size=440960

Something strange, does not? Why sizes differs (while dims and padded_dims are the same)?

Nov 07 '25 06:11 lacak-sk

bw_weights_layer_md is an input to the backward LSTM. Its internal layout may differ from the forward descriptor (fw_weights_layer_md), which is expected and doesn't impact the weight update logic. The key detail is that fw_weights_layer_md equals bw_diff_weights_layer_md, meaning the forward weights descriptor matches the backward gradient descriptor. This ensures you can safely perform weights += lr * diff_weights without needing a reorder.

Nov 07 '25 07:11 shu1chen

Yes that gives sense in relation to weight updates.

But now I face problem, that when I execute backward primitive I get access violation exception in dnnl.dll So I am investigating what happened here and I found that when I query weights memory descriptor for forward and backward primitive I get memory descriptors which are of different memory sizes (not dimensions, see results of dnnl_memory_desc_get_size() above) ... I guess (may be I am wrong) that backward primitive reads memory behind the allocated memory which leads to AV.

Of course I pass as execution argument (DNNL_ARG_WEIGHTS_LAYER) same memory for forward primitive as for backward primitive ...

EDIT: I found that forward and backward RNN LSTM primitive expects different memory layouts for weights_layer and weights_iter tensors. So only way is during training do reorder (using dnnl_reorder_primitive_desc_create()) of both weights between forward and backward pass ... which is bit unintuitive and IMO should be better documented with may be example which combines forward and backward primitive creation with memory allocation and reordering ...

Nov 07 '25 09:11 lacak-sk

How to query for memory descriptors for weights_layer_md and weights_iter_md in RNN LSTM from primitive descriptor