onnx-mlir
onnx-mlir copied to clipboard
Introduction of ONNX layout for SIMD optimizations
New PR derived from #1709, which add data layout in ONNX that are good for SIMD exploitation. It is still in the original ONNX dialect as I believe it will be beneficial for other applications. However, it will only be activated by selected flags, currently it is off as it is only a partial result.
Example of transformation:
%1 = "onnx.Conv"(%arg0, %arg1, %0) {auto_pad = "NOTSET", kernel_shape = [2, 2], pads = [0, 0, 0, 0]} : (tensor<5x3x32x32xf32>, tensor<?x3x2x2xf32>, none) -> tensor<5x?x31x31xf32>
becomes
%1 = "onnx.LayoutTransform"(%arg0) {layout = "UNDEFINED"} : (tensor<5x3x32x32xf32>) -> tensor<5x3x32x32xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>
%2 = "onnx.LayoutTransform"(%arg1) {layout = "UNDEFINED"} : (tensor<?x3x2x2xf32>) -> tensor<?x3x2x2xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>
%3 = "onnx.Conv"(%1, %2, %0) {auto_pad = "NOTSET", group = 1 : si64, kernel_shape = [2, 2], pads = [0, 0, 0, 0]} : (tensor<5x3x32x32xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>, tensor<?x3x2x2xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>, none) -> tensor<5x?x31x31xf32>
%4 = "onnx.LayoutTransform"(%3) {layout = "UNDEFINED"} : (tensor<5x?x31x31xf32>) -> tensor<5x?x31x31xf32>
Next steps will be to lower the new ops to Krnl dialect. Note that layout="UNDEFINED"
is the original ONNX layout. Maybe I could use a better name for it. Suggestions welcome.
PR includes a lit test.
It seems the third_party
folder in this patch was not sycned with the main branch.
Comment just based on PR description:
Is this op used to transform the layout or to mark the layout? It seems to me that the operation of an "onnx.onnx.LayoutTransform" is not described by the input and attribute, but the shape info of output, tensor<5x3x32x32xf32, #onnx.encoding<{dataLayout = "NCHW4C"}
. This is not the usual way the op is defined. Consider putting NCHW4C
into the attribute of output_layout? The input layout unknown
could be inferred from the input tensor, which does not have the onnx.encoding. Think in another way: how would the shape inference works with this op?
Is this op used to transform the layout or to mark the layout? It seems to me that the operation of an "onnx.onnx.LayoutTransform" is not described by the input and attribute, but the shape info of output, tensor<5x3x32x32xf32, #onnx.encoding<{dataLayout = "NCHW4C"}. This is not the usual way the op is defined. Consider putting NCHW4C into the attribute of output_layout? The input layout unknown could be inferred from the input tensor, which does not have the onnx.encoding. Think in another way: how would the shape inference works with this op?
Just like stickfy
and unstickify
, this op just label the transformation where all the "work" is indicated by the tensors labeled with the new layout. No dimensions are changed as the layout will be only a different mapping, not a different size for the tensors.
The PR comment should be updated I guess, e.g. layout in onnx.LayoutTransform is now the target layout
I am changing the name of the option to target_layout
. It also bothered me, now it is much clearer.
I am changing the name of the option to target_layout. It also bothered me, now it is much clearer.
Thanks! I like it.
@sstamenova This PR currently fails only for Window, and I have limited view on why it does so. I did change transforms for CONV but the new transforms are not enabled yet as not all the steps are in yet.
The failure is in TestConv in numerical, with the first test. It is a test that convert a CONV 1x1 into a matmul. I added a printf ("hi alex") after the pattern is successfully applied, but the error I get is this one.
1/9 Test #1: TestConv .........................Exit code 0xc0000409
***Exception: 0.17 sec
hi alex: success in converting conv to matmul
error: failed to materialize conversion for result #0 of operation 'onnx.Conv' that remained live after conversion
Assertion failed: isOMConvTheSameAsNaiveImplFor( 1, 64, 64, 55, 55, 1, 1, 0, 0, 0, 0, ConvAutoPad::NOTSET) && "failed test from test_inception_v1_cpu", file D:\a\1\onnx-mlir\test\numerical\TestConv.cpp, line 101
It appears to be generated from MLIR DialectConversion.cpp
LogicalResult OperationConverter::legalizeChangedResultType(
Operation *op, OpResult result, Value newValue,
TypeConverter *replConverter, ConversionPatternRewriter &rewriter,
ConversionPatternRewriterImpl &rewriterImpl,
const DenseMap<Value, SmallVector<Value>> &inverseMapping) {
Operation *liveUser =
findLiveUserOfReplaced(result, rewriterImpl, inverseMapping);
if (!liveUser)
return success();
// Functor used to emit a conversion error for a failed materialization.
auto emitConversionError = [&] {
InFlightDiagnostic diag = op->emitError()
<< "failed to materialize conversion for result #"
<< result.getResultNumber() << " of operation '"
<< op->getName()
<< "' that remained live after conversion";
diag.attachNote(liveUser->getLoc())
<< "see existing live user here: " << *liveUser;
return failure();
};
Have you or anyone else seen something like this before? Right now, I have no machine where I can directly run things on Window.
@AlexandreEichenberger : I've not run into this but I'll follow up with our team to see if someone else might have an idea. It does look reminiscent of the error @sorenlassen was running into with the If lowering: https://github.com/onnx/onnx-mlir/pull/1717
I have been seeing similar recently, in particular with einsum decomposition. I think we need to do shape inference earlier. Basically, it means that after replacing an op with some other op, there was some type mismatch using the result, which requires a conversion (that we don't have). At least in my case result type was not the same as the use because the decomposition used the more precise shape information for the result of einsum, e.g. the old type was tensor<?x?xf32> and new type was tensor<64x64xf32>.
Doing shape inference before onnx decomposition worked for me, but I don't know if that could lead to other issues. I vaguely remember looking at this in the past and I think there might have been some issues with it. If so, maybe there is some way to implement a type converter to deal with this, but I haven't looked into that.
I have been seeing similar recently,
@MikeHolman was it also failing only on Windows?
I don't know. I was only testing on Windows and don't have a Linux environment set up to test with, but it seems pretty weird that this is architecture specific.
it seems pretty weird that this is architecture specific.
agreed. Windows folks, is there a way to test using a cloud machine? I don't have access to a windows.
@sstamenova I am not making much progress isolating the error. Would someone be able to build this and give me a log of where it crashes?
Debug/bin/TestConv
is the test (or same with release).
Ideally, would there be a way to access a cloud Window machine? We are all macs here.
@AlexandreEichenberger I'm building now. I can do a little debugging and let you know what I see.
@AlexandreEichenberger Maybe you might have more insight into why this is the case, but the tensor type has an attribute about dataLayout (that I haven't seen before), and this attribute is on the original Conv, but isn't on the result of the final Reshape that replaces it. I believe this mismatch is what is causing the failure.
func.func @main_graph(%arg0: tensor<1x64x55x55xf32>, %arg1: tensor<64x64x1x1xf32>) -> tensor<1x64x55x55xf32> {
%0 = "onnx.NoValue"() {value} : () -> none
%1 = "onnx.LayoutTransform"(%arg0) {target_layout = "NCHW4C"} : (tensor<1x64x55x55xf32>) -> tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>
%2 = "onnx.LayoutTransform"(%arg1) {target_layout = "KCMN4C4K"} : (tensor<64x64x1x1xf32>) -> tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>
%3 = "onnx.Shape"(%1) : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>) -> tensor<4xi64>
%4 = "onnx.Constant"() {value = dense<-1> : tensor<1xi64>} : () -> tensor<1xi64>
%5 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
%6 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
%7 = "onnx.Constant"() {value = dense<2> : tensor<1xi64>} : () -> tensor<1xi64>
%8 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
%9 = "onnx.Slice"(%3, %6, %7, %5, %8) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<2xi64>
%10 = "onnx.Concat"(%9, %4) {axis = 0 : si64} : (tensor<2xi64>, tensor<1xi64>) -> tensor<3xi64>
%11 = "onnx.Reshape"(%1, %10) {allowzero = 0 : si64} : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>, tensor<3xi64>) -> tensor<1x64x?xf32>
%12 = "onnx.Shape"(%2) : (tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>) -> tensor<4xi64>
%13 = "onnx.Constant"() {value = dense<-1> : tensor<1xi64>} : () -> tensor<1xi64>
%14 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
%15 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
%16 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
%17 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
%18 = "onnx.Slice"(%12, %15, %16, %14, %17) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<1xi64>
%19 = "onnx.Concat"(%18, %13) {axis = 0 : si64} : (tensor<1xi64>, tensor<1xi64>) -> tensor<2xi64>
%20 = "onnx.Reshape"(%2, %19) {allowzero = 0 : si64} : (tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>, tensor<2xi64>) -> tensor<64x?xf32>
%21 = "onnx.MatMul"(%20, %11) : (tensor<64x?xf32>, tensor<1x64x?xf32>) -> tensor<1x64x?xf32>
%22 = "onnx.Shape"(%1) : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>) -> tensor<4xi64>
%23 = "onnx.Shape"(%2) : (tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>) -> tensor<4xi64>
%24 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
%25 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
%26 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
%27 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
%28 = "onnx.Slice"(%22, %25, %26, %24, %27) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<1xi64>
%29 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
%30 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
%31 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
%32 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
%33 = "onnx.Slice"(%23, %30, %31, %29, %32) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<1xi64>
%34 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
%35 = "onnx.Constant"() {value = dense<2> : tensor<1xi64>} : () -> tensor<1xi64>
%36 = "onnx.Constant"() {value = dense<4> : tensor<1xi64>} : () -> tensor<1xi64>
%37 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
%38 = "onnx.Slice"(%22, %35, %36, %34, %37) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<2xi64>
%39 = "onnx.Concat"(%28, %33, %38) {axis = 0 : si64} : (tensor<1xi64>, tensor<1xi64>, tensor<2xi64>) -> tensor<4xi64>
%40 = "onnx.Reshape"(%21, %39) {allowzero = 0 : si64} : (tensor<1x64x?xf32>, tensor<4xi64>) -> tensor<1x64x55x55xf32>
// New result type is tensor<1x64x55x55xf32>
%41 = "onnx.Conv"(%1, %2, %0) {auto_pad = "NOTSET", dilations = [1, 1], group = 1 : si64, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]} : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>, tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>, none) -> tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>
// Original result type is tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>
%42 = "onnx.LayoutTransform"(%41) {target_layout = "STANDARD"} : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>) -> tensor<1x64x55x55xf32>
%43 = "onnx.Conv"(%arg0, %arg1, %0) {auto_pad = "NOTSET", dilations = [1, 1], group = 1 : si64, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]} : (tensor<1x64x55x55xf32>, tensor<64x64x1x1xf32>, none) -> tensor<1x64x55x55xf32>
return %43 : tensor<1x64x55x55xf32>
}
Just tested your fix, and it worked!
Jenkins Linux ppc64le Build #7337 [push] Introduction of ONNX lay... started at 20:08
Jenkins Linux s390x Build #8286 [push] Introduction of ONNX lay... started at 20:07
Jenkins Linux amd64 Build #8270 [push] Introduction of ONNX lay... started at 19:07
Jenkins Linux amd64 Build #8270 [push] Introduction of ONNX lay... passed after 1 hr 14 min
Jenkins Linux ppc64le Build #7337 [push] Introduction of ONNX lay... passed after 1 hr 40 min
Jenkins Linux s390x Build #8286 [push] Introduction of ONNX lay... passed after 1 hr 41 min
For ref, the bug was fixed using @tungld 's suggestion pasted below.
About the issue on Windows with your data layout PR, I am thinking there may be an issue in the return type, e.g. Line 204 in ConvOpt.cpp:
Value res = create.onnx.reshape(outputType, MM, outputShapeVals);
I am not sure whether changing to the following can fix the issue:
Value res = create.onnx.reshape(convOp.Y().getType(), MM, outputShapeVals);
It solved the problem.