oneDNN rfc: propose optional bit exact conv

Jun 30 '22 11:06 maayaneh

Thanks for the proposal, @maayaneh! Were you able to evaluate performance loss with the proposed sequence?

Implementation PoC from the comments: https://github.com/oneapi-src/oneDNN/compare/master...maayaneh:oneDNN:master

Jul 01 '22 20:07 vpirogov

Hi @vpigorov, we measured the performance on our own workload, which includes roughly 700 networks. Convolution is the most prominent kernel in our workload, but it includes other layers as well as quantization operations. For our workload, the execution time using this commit is 1.38x slower than the original code.

We did not measure the modified oneDNN convolution kernels alone, but based on anecdotal measurements I would expect these kernels to be close to 2x slower.

Jul 03 '22 16:07 maayaneh

Thank you @maayaneh for the proposal. Could you clarify which API you want discussed? From the branch you link to, it is simply using a preprocessor variable (so compile time enabling).

Is it the usage you see for this feature (a cmake option essentially)?

Jul 04 '22 12:07 mgouicem

tagging some folks from CPU team: @kwiersch @msotoflo @akharito Please take a look

Aug 01 '22 19:08 igorsafo

@maayaneh , what is the expectations for conv with post-ops ? We currently allow conversions to f32 between the main operations (e.g. convolution) and post-ops (e.g relu). This can cause different results (e.g. for conv + relu with int8 inputs and outputs, an intermediate rounding to f32 can lead to different result). Is such a conversion ok in the context of the proposed build option?

Aug 29 '22 15:08 mgouicem

@maayaneh , what is the expectations for conv with post-ops ? We currently allow conversions to f32 between the main operations (e.g. convolution) and post-ops (e.g relu). This can cause different results (e.g. for conv + relu with int8 inputs and outputs, an intermediate rounding to f32 can lead to different result). Is such a conversion ok in the context of the proposed build option?

@mgouicem, I think this issue should be documented, but I suggest it isn't blocked, since in many use cases this wouldn't hurt accuracy. In our usage we do allow post-ops in conv with s8 output and we get accurate results. For a number to be inaccurately converted to f32, it must be bigger than 2^24, in which case the inaccuracy caused by the f32 conversion is masked later by the s8 saturation.

Aug 30 '22 07:08 maayaneh

oneDNN oneDNN copied to clipboard

rfc: propose optional bit exact conv

oneDNN
oneDNN copied to clipboard