oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

gpu: nvidia: conv: Fix int8 convolution primitive fails

Open kala855 opened this issue 2 years ago • 2 comments

Description

When destination scaling is applied to int8 data types some benchmarks fails. This happened because the scaling was applied over the int8 result causing saturation issues.

Fixes #1749

Solution:

We create a temporal vector to save the values in f32 and keep the computation as expected by oneDNN. The scratchpad_size is modified because of this change. We launch cudnnConvolutionForward to obtain its result as f32. We apply the scaling parameters for src/wei at this stage. After the post-operations, we apply the scaling for dst over the f32 result to avoid saturation issues. Finally, the result is converted to s8.

Checklist

General

  • [x] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • [x] Have you formatted the code using clang-format?

Bug fixes

  • [x] Have you included information on how to reproduce the issue (either in a github issue or in this PR)?

kala855 avatar Nov 28 '23 10:11 kala855

Just a reminder that oneDNN v3.4 code freeze is coming on January 26.

vpirogov avatar Jan 12 '24 20:01 vpirogov

We have updated the PR. We follow the equation to apply the destination scaling after the post-operations. As cudnnConvolutionForward does not support having its output in s32 format, we follow a process to obtain it as an f32 value when the inputs are s8. Then, the scaling parameters for src/wei are applied. Finally, after the post-operations, the scaling for dst is applied over the f32 value, which is then converted to s8.

kala855 avatar Jan 16 '24 14:01 kala855

There is another class of test cases that fail: no dst scale, but dst is s8: ./build/tests/benchdnn/benchdnn --conv --engine=gpu --skip-impl=ref --dir=FWD_I --dt=s8:s8:s8 --attr-scales=src:common:2 --attr-post-ops=sum mb1ic512iw121oc512ow122kw6pw3nconv1d:21

Even dst scale is not set, the post ops computation should happen in f32 and then the dst should be quantized but the dst_scale will be equal to 1 (default scale value).

I addressed this comment in the last commit. Just a kind reminder to take a look at it. Thanks @igorsafo

kala855 avatar Apr 03 '24 21:04 kala855

@kala855 FYI, I pushed a minor change to reduce code duplication.

igorsafo avatar Apr 24 '24 21:04 igorsafo

Please squash all the changes and remove all merge commits - the history must be linear.

dzarukin avatar May 01 '24 04:05 dzarukin

@dzarukin I did the squash as you mentioned. Thank you very much. Let me know if something else needs to be done.

kala855 avatar May 02 '24 13:05 kala855

Thanks for the contribution.

dzarukin avatar May 05 '24 01:05 dzarukin