[nvidia] int8 convolution primitive fails correctness check

Open dzarukin opened this issue 2 years ago • 0 comments

Summary

oneDNN validation for Nvidia backend hits a correctness issue under benchdnn on int8 convolution problems with dst scale set.

Steps to reproduce

Build

mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=release (or debug) -DDNNL_CPU_RUNTIME=DPCPP (or NONE) -DDNNL_GPU_RUNTIME=DPCPP -DDNNL_GPU_VENDOR=NVIDIA -DONEDNN_BUILD_GRAPH=OFF
cmake --build . --target benchdnn

Run

benchdnn --conv --engine=gpu --dir=FWD_I --dt=s8:s8:s8 --attr-scales=dst:common:2 g4ic20ih5oc20oh5kh3ph1n"2d_tail_conv:grouped"

Observed behavior

Failures are reproducible within a single run.

run: --mode-modifier=P --conv --engine=gpu --dir=FWD_I --dt=s8:s8:s8 --attr-scales=dst:common:2 g4ic20ih5oc20oh5kh3ph1n"2d_tail_conv:grouped"
[  91][DST][0:3:3:1] exp_f32:         -74 exp:         -74 got:         -64 diff:      10 rdiff:0.135135
[ 289][DST][0:11:2:4] exp_f32:         -68 exp:         -68 got:         -64 diff:       4 rdiff:0.0588235
[ 736][DST][1:9:2:1] exp_f32:          79 exp:          79 got:          64 diff:      15 rdiff:0.189873
[ 813][DST][1:12:2:3] exp_f32:         -71 exp:         -71 got:         -64 diff:       7 rdiff:0.0985916
[ 858][DST][1:14:1:3] exp_f32:         -68 exp:         -68 got:         -64 diff:       4 rdiff:0.0588235
[COMPARE_STATS][DST]: trh=0 max_diff:      15 max_rdiff:0.189873
2207:FAILED (errors:5 total:1000) __REPRO: --mode-modifier=P --conv --engine=gpu --dir=FWD_I --dt=s8:s8:s8 --attr-scales=dst:common:2 g4ic20ih5oc20oh5kh3ph1n"2d_tail_conv:grouped"

The reason for failures is incorrect scale handling. Instead of applying it over f32 output coming from accumulation, it down converts the output value into s8 first. Such conversion saturates the output and only then applies the scale which leads to a mismatched result.

Nov 06 '23 22:11 dzarukin