Dmitry Zarukin
Dmitry Zarukin
Hi @alexandrelimassantana, thank you for the question. oneDNN provides submemory_desc API to create this kind of operation descriptor. User has to create a bigger memory descriptor and then create a...
Hi @alexandrelimassantana, API is available since v0, so it definitely should be in v1.7. I believe you are referring to effect of cache associativity for a [set-associative cache](https://en.wikipedia.org/wiki/Cache_placement_policies#Set-associative_cache). In general...
Hi @ZJUFangzh, that's a very good question! I would say that the primary reason is that Aarch64 maintainers [committed](https://github.com/oneapi-src/oneDNN/issues/1017) to support OMP runtime only. It means that any other runtime,...
Hi @ZJUFangzh, I can comment on problem 1. This is rather an effect of not quite honest comparison rather than data influence. First call would initialize and generate all the...
Hi @ZJUFangzh, > if I use my THREADPOOL to replace omp_thread_pool, can I concurrently execute different streams which init with the same THREADPOOL ptr? As long as you would like...
Hi @jelmervdl, thank you for your question. > Is there a way to do C:float32 = Relu(Scaling:f32 * (A:int8 * B:int8) + Bias:float32) instead? Besides an option with binary_add, there...
Hi @StrongerXi, thank you for the question. I'm not sure what exactly you are trying to achieve. Consider a 2x3 tensor with [0][1][2] [3][4][5] elements with `ab` format (strides are...
Alright, I guess I see now. So, with current API available I can imagine only the following flow: 1. Take a source `abc`-like tensor and transpose data to a destination...
Yes, that's the best what I can imagine to make it the desired way. Please let me know if it works for you. Thanks.
Hi @StrongerXi, sorry for a delay. It seems to me you pointed to the right function - `permute_axes`. I think this is the general flow you might want to consider...