[regression-onnx] Numeric regression for Conv ops
What happened?
For the given IR
%994 = torch.operator "onnx.DequantizeLinear"(%676, %674, %675) : (!torch.vtensor<[1024,32,3,3],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1024,32,3,3],f32>
%995 = torch.operator "onnx.DequantizeLinear"(%679, %677, %678) : (!torch.vtensor<[1024],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1024],f32>
%996 = torch.operator "onnx.DequantizeLinear"(%686, %684, %685) : (!torch.vtensor<[2048,1024,1,1],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[2048,1024,1,1],f32>
%997 = torch.operator "onnx.DequantizeLinear"(%689, %687, %688) : (!torch.vtensor<[2048],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[2048],f32>
%1456 = torch.operator "onnx.QuantizeLinear"(%arg1, %657, %656) : (!torch.vtensor<[1,1024,14,14],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1,1024,14,14],si8>
%1457 = torch.operator "onnx.DequantizeLinear"(%1456, %657, %656) : (!torch.vtensor<[1,1024,14,14],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1,1024,14,14],f32>
%1465 = torch.operator "onnx.Conv"(%1457, %994, %995) {torch.onnx.dilations = [1 : si64, 1 : si64], torch.onnx.group = 32 : si64, torch.onnx.kernel_shape = [3 : si64, 3 : si64], torch.onnx.pads = [1 : si64, 1 : si64, 1 : si64, 1 : si64], torch.onnx.strides = [2 : si64, 2 : si64]} : (!torch.vtensor<[1,1024,14,14],f32>, !torch.vtensor<[1024,32,3,3],f32>, !torch.vtensor<[1024],f32>) -> !torch.vtensor<[1,1024,7,7],f32>
%1467 = torch.operator "onnx.QuantizeLinear"(%1465, %681, %680) : (!torch.vtensor<[1,1024,7,7],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1,1024,7,7],si8>
%1468 = torch.operator "onnx.DequantizeLinear"(%1467, %681, %680) : (!torch.vtensor<[1,1024,7,7],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1,1024,7,7],f32>
%1469 = torch.operator "onnx.Conv"(%1468, %996, %997) {torch.onnx.dilations = [1 : si64, 1 : si64], torch.onnx.group = 1 : si64, torch.onnx.kernel_shape = [1 : si64, 1 : si64], torch.onnx.pads = [0 : si64, 0 : si64, 0 : si64, 0 : si64], torch.onnx.strides = [1 : si64, 1 : si64]} : (!torch.vtensor<[1,1024,7,7],f32>, !torch.vtensor<[2048,1024,1,1],f32>, !torch.vtensor<[2048],f32>) -> !torch.vtensor<[1,2048,7,7],f32>
return %1469 : !torch.vtensor<[1,2048,7,7],f32>
seeing numeric regression for this change.
If we pass arg1 in place of %1457 for Conv operator then o/p are matching.
Steps to reproduce your issue
command:
iree-compile test.mlir --iree-hal-target-backends=rocm --iree-hip-target=gfx942 -o test.vmfb
iree-run-module --module='test.vmfb' --device=hip --function='main_graph' --input='[email protected]
expected_output.bin.txt input.1.bin.txt test.mlir.txt
What component(s) does this issue relate to?
No response
Version information
No response
Additional context
No response
Surprisingly the error came down to this review comment in the original PR
I am very surprised by this as this is about overflow<nsw> flag in arith.addi
When I added the flag to all upstream pipeline arith.addi ops the issue went away, what I am really confused about is that
we are talking about this addi
%45 = scf.for %arg3 = %c0 to %c8 step %c1 iter_args(%arg4 = %25) -> (vector<1x1x1x1x4x1xi32>) {
%67 = arith.addi %arg3, %c1 overflow<nsw> : index
its never going to overflow then why does it matter, are we mishandling the op without the flag perhaps?
cc @krzysz00 to see if he can provide a reasonable explaination to this
Also cc @jerryyin (to see when back from break)
Edit: Based on looking at optimized.ll and .rocasm files it produces drastically different IR/ code, i will go ahead and send an upstream patch to fix this. My best guess is that even though the flag is on a value that doesnt overflow, as we lower it, it gets propagated to a situation where we actually need these semantics
@bjacob maybe this is related to the intermittent resnet onnx failures in CI?
@nirvedhmeshram , being on LLVM integration this week, I'm available (and interested) in testing any work-in-progress patch that you might have, and/or step in to help as needed.
@zjgarvey might be but this issue is consistent and not intermittent, its happening in a very specific case of strided convs @bjacob Thanks, the issue is triaged to optimization after scf-to-cf folding of duplicate blocks which were previously blocked due to a CSE not happening due to a overflow flag on an addition op, however the current theory is that the backend is not able to handle the optimized IR, I will update here if there is some fix to try.