oneDNN
oneDNN copied to clipboard
rfc: propose optional bit exact conv
Thanks for the proposal, @maayaneh! Were you able to evaluate performance loss with the proposed sequence?
Implementation PoC from the comments: https://github.com/oneapi-src/oneDNN/compare/master...maayaneh:oneDNN:master
Hi @vpigorov, we measured the performance on our own workload, which includes roughly 700 networks. Convolution is the most prominent kernel in our workload, but it includes other layers as well as quantization operations. For our workload, the execution time using this commit is 1.38x slower than the original code.
We did not measure the modified oneDNN convolution kernels alone, but based on anecdotal measurements I would expect these kernels to be close to 2x slower.
Thank you @maayaneh for the proposal. Could you clarify which API you want discussed? From the branch you link to, it is simply using a preprocessor variable (so compile time enabling).
Is it the usage you see for this feature (a cmake option essentially)?
tagging some folks from CPU team: @kwiersch @msotoflo @akharito Please take a look
@maayaneh , what is the expectations for conv with post-ops ? We currently allow conversions to f32 between the main operations (e.g. convolution) and post-ops (e.g relu). This can cause different results (e.g. for conv + relu with int8 inputs and outputs, an intermediate rounding to f32 can lead to different result). Is such a conversion ok in the context of the proposed build option?
@maayaneh , what is the expectations for conv with post-ops ? We currently allow conversions to f32 between the main operations (e.g. convolution) and post-ops (e.g relu). This can cause different results (e.g. for conv + relu with int8 inputs and outputs, an intermediate rounding to f32 can lead to different result). Is such a conversion ok in the context of the proposed build option?
@mgouicem, I think this issue should be documented, but I suggest it isn't blocked, since in many use cases this wouldn't hurt accuracy. In our usage we do allow post-ops in conv with s8 output and we get accurate results. For a number to be inaccurately converted to f32, it must be bigger than 2^24, in which case the inaccuracy caused by the f32 conversion is masked later by the s8 saturation.