oneDNN issues

Performance regression from v1.4 to v2.6

10

# Summary I have benchmarked various standard deep learning networks such as AlexNet, GoogleNet, ResNet50, and MobileNet-V2. I have observed that OneDNN v2.6 performs slower compared with v1.4. # Version...

Hari-MathWorks

performance

rfc: quantization: extending scaling support

12

# Description A link to the rendered document: [Link](https://github.com/igorsafo/oneDNN/tree/rfcs/rfcs/20220201-quantization-scaling)

igorsafo

RFC

cpu: aarch64: add arbitrary ACL post ops

6

# Description Adds class `acl_post_ops_t` which enables Compute Library for the Arm® Architecture (ACL) based primitives to have an arbitrary number and type of post ops by composing `acl_binary_t` and...

jondea

cpu: aarch64: add eltwise post ops to ACL bnorm

# Description This PR adds eltwise post ops to the Compute Library for the Arm® architecture (ACL) batch normalization primitive. ReLU (including leaky and bounded) are fused into the bnorm...

jondea

LSTM kernel performs 1.5x slower for v2.6 compared with v1.4

4

# Summary I have benchmarked LSTM layer using OneDNN for v1.4 and v2.6. I have observed that OneDNN v2.6 performs 1.5x slower compared with v1.4. # Version v2.6 # Environment...

Hari-MathWorks

help wanted

performance

platform:cpu-x64

rfcs: introduce graph api

This proposal aims to introduce a set of graph API into oneDNN. Rendered version: [link](https://github.com/TaoLv/mkl-dnn/blob/lvtao/rfcs/graph-api/rfcs/20220711-graph-api/README.md) cc @jianhui-li @igorsafo @mgouicem

TaoLv

RFC

rfc: propose optional bit exact conv

6

[Link to rendered document](https://github.com/maayaneh/oneDNN/blob/bit_exact_conv_rfc/rfcs/20220630-bit-exact-conv/README.md)

maayaneh

RFC

Extending support for binary primitive

2

# Description This PR extends binary SYCL kernel support for non-uniform group sizes. This includes a new logic for work-item config in kernel launch and handling the trailing portions of...

TejaX-Alaghari

`dnnl::memory::desc::get_size()` returns 0 for submemory with non-zero offset

2

The root cause seems to be [this commit](https://github.com/oneapi-src/oneDNN/commit/e0e46ccdaec02e8ed4b9606564a721c225f33960). Is there any reason why this is the behavior? It came as a surprise to me because it wasn't specified in the...

StrongerXi

question

How to transpose a tensor

12

## Context I'm trying to use OneDNN to implement arbitrary axis permutation for a tensor, e.g., ``` t = [[0, 1], [2, 3]] t.transpose(); // == t.transpose({1, 0}) ``` And...

StrongerXi

question

oneDNN
oneDNN copied to clipboard

Metadata

Performance regression from v1.4 to v2.6

rfc: quantization: extending scaling support

cpu: aarch64: add arbitrary ACL post ops

cpu: aarch64: add eltwise post ops to ACL bnorm

LSTM kernel performs 1.5x slower for v2.6 compared with v1.4

rfcs: introduce graph api

rfc: propose optional bit exact conv

Extending support for binary primitive

`dnnl::memory::desc::get_size()` returns 0 for submemory with non-zero offset

How to transpose a tensor

← Metadata

Owner

Metadata

oneDNN oneDNN copied to clipboard

Metadata

← Metadata

Owner

Metadata

oneDNN
oneDNN copied to clipboard