onnx-mlir Introduction of ONNX layout for SIMD optimizations

New PR derived from #1709, which add data layout in ONNX that are good for SIMD exploitation. It is still in the original ONNX dialect as I believe it will be beneficial for other applications. However, it will only be activated by selected flags, currently it is off as it is only a partial result.

Example of transformation:

    %1 = "onnx.Conv"(%arg0, %arg1, %0) {auto_pad = "NOTSET", kernel_shape = [2, 2], pads = [0, 0, 0, 0]} : (tensor<5x3x32x32xf32>, tensor<?x3x2x2xf32>, none) -> tensor<5x?x31x31xf32>

becomes

    %1 = "onnx.LayoutTransform"(%arg0) {layout = "UNDEFINED"} : (tensor<5x3x32x32xf32>) -> tensor<5x3x32x32xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>
    %2 = "onnx.LayoutTransform"(%arg1) {layout = "UNDEFINED"} : (tensor<?x3x2x2xf32>) -> tensor<?x3x2x2xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>
    %3 = "onnx.Conv"(%1, %2, %0) {auto_pad = "NOTSET", group = 1 : si64, kernel_shape = [2, 2], pads = [0, 0, 0, 0]} : (tensor<5x3x32x32xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>, tensor<?x3x2x2xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>, none) -> tensor<5x?x31x31xf32>
    %4 = "onnx.LayoutTransform"(%3) {layout = "UNDEFINED"} : (tensor<5x?x31x31xf32>) -> tensor<5x?x31x31xf32>

Next steps will be to lower the new ops to Krnl dialect. Note that layout="UNDEFINED" is the original ONNX layout. Maybe I could use a better name for it. Suggestions welcome.

PR includes a lit test.

Oct 04 '22 14:10 AlexandreEichenberger

It seems the third_party folder in this patch was not sycned with the main branch.

Oct 05 '22 05:10 tungld

Comment just based on PR description: Is this op used to transform the layout or to mark the layout? It seems to me that the operation of an "onnx.onnx.LayoutTransform" is not described by the input and attribute, but the shape info of output, tensor<5x3x32x32xf32, #onnx.encoding<{dataLayout = "NCHW4C"}. This is not the usual way the op is defined. Consider putting NCHW4C into the attribute of output_layout? The input layout unknown could be inferred from the input tensor, which does not have the onnx.encoding. Think in another way: how would the shape inference works with this op?

Oct 10 '22 14:10 chentong319

Is this op used to transform the layout or to mark the layout? It seems to me that the operation of an "onnx.onnx.LayoutTransform" is not described by the input and attribute, but the shape info of output, tensor<5x3x32x32xf32, #onnx.encoding<{dataLayout = "NCHW4C"}. This is not the usual way the op is defined. Consider putting NCHW4C into the attribute of output_layout? The input layout unknown could be inferred from the input tensor, which does not have the onnx.encoding. Think in another way: how would the shape inference works with this op?

Just like stickfy and unstickify, this op just label the transformation where all the "work" is indicated by the tensors labeled with the new layout. No dimensions are changed as the layout will be only a different mapping, not a different size for the tensors.

Oct 11 '22 20:10 AlexandreEichenberger

The PR comment should be updated I guess, e.g. layout in onnx.LayoutTransform is now the target layout

I am changing the name of the option to target_layout. It also bothered me, now it is much clearer.

Oct 12 '22 15:10 AlexandreEichenberger

I am changing the name of the option to target_layout. It also bothered me, now it is much clearer.

Thanks! I like it.

Oct 13 '22 00:10 tungld

@sstamenova This PR currently fails only for Window, and I have limited view on why it does so. I did change transforms for CONV but the new transforms are not enabled yet as not all the steps are in yet.

The failure is in TestConv in numerical, with the first test. It is a test that convert a CONV 1x1 into a matmul. I added a printf ("hi alex") after the pattern is successfully applied, but the error I get is this one.

1/9 Test #1: TestConv .........................Exit code 0xc0000409
***Exception:   0.17 sec
  hi alex: success in converting conv to matmul
error: failed to materialize conversion for result #0 of operation 'onnx.Conv' that remained live after conversion
Assertion failed: isOMConvTheSameAsNaiveImplFor( 1, 64, 64, 55, 55, 1, 1, 0, 0, 0, 0, ConvAutoPad::NOTSET) && "failed test from test_inception_v1_cpu", file D:\a\1\onnx-mlir\test\numerical\TestConv.cpp, line 101

It appears to be generated from MLIR DialectConversion.cpp

LogicalResult OperationConverter::legalizeChangedResultType(
    Operation *op, OpResult result, Value newValue,
    TypeConverter *replConverter, ConversionPatternRewriter &rewriter,
    ConversionPatternRewriterImpl &rewriterImpl,
    const DenseMap<Value, SmallVector<Value>> &inverseMapping) {
  Operation *liveUser =
      findLiveUserOfReplaced(result, rewriterImpl, inverseMapping);
  if (!liveUser)
    return success();

  // Functor used to emit a conversion error for a failed materialization.
  auto emitConversionError = [&] {
    InFlightDiagnostic diag = op->emitError()
                              << "failed to materialize conversion for result #"
                              << result.getResultNumber() << " of operation '"
                              << op->getName()
                              << "' that remained live after conversion";
    diag.attachNote(liveUser->getLoc())
        << "see existing live user here: " << *liveUser;
    return failure();
  };

Have you or anyone else seen something like this before? Right now, I have no machine where I can directly run things on Window.

Oct 13 '22 15:10 AlexandreEichenberger

@AlexandreEichenberger : I've not run into this but I'll follow up with our team to see if someone else might have an idea. It does look reminiscent of the error @sorenlassen was running into with the If lowering: https://github.com/onnx/onnx-mlir/pull/1717

Oct 13 '22 16:10 sstamenova

I have been seeing similar recently, in particular with einsum decomposition. I think we need to do shape inference earlier. Basically, it means that after replacing an op with some other op, there was some type mismatch using the result, which requires a conversion (that we don't have). At least in my case result type was not the same as the use because the decomposition used the more precise shape information for the result of einsum, e.g. the old type was tensor<?x?xf32> and new type was tensor<64x64xf32>.

Doing shape inference before onnx decomposition worked for me, but I don't know if that could lead to other issues. I vaguely remember looking at this in the past and I think there might have been some issues with it. If so, maybe there is some way to implement a type converter to deal with this, but I haven't looked into that.

Oct 13 '22 16:10 MikeHolman

I have been seeing similar recently,

@MikeHolman was it also failing only on Windows?

Oct 13 '22 20:10 AlexandreEichenberger

I don't know. I was only testing on Windows and don't have a Linux environment set up to test with, but it seems pretty weird that this is architecture specific.

Oct 13 '22 20:10 MikeHolman

it seems pretty weird that this is architecture specific.

agreed. Windows folks, is there a way to test using a cloud machine? I don't have access to a windows.

Oct 14 '22 14:10 AlexandreEichenberger

@sstamenova I am not making much progress isolating the error. Would someone be able to build this and give me a log of where it crashes? Debug/bin/TestConv is the test (or same with release).

Ideally, would there be a way to access a cloud Window machine? We are all macs here.

Oct 19 '22 18:10 AlexandreEichenberger

@AlexandreEichenberger I'm building now. I can do a little debugging and let you know what I see.

Oct 19 '22 18:10 MikeHolman

@AlexandreEichenberger Maybe you might have more insight into why this is the case, but the tensor type has an attribute about dataLayout (that I haven't seen before), and this attribute is on the original Conv, but isn't on the result of the final Reshape that replaces it. I believe this mismatch is what is causing the failure.

func.func @main_graph(%arg0: tensor<1x64x55x55xf32>, %arg1: tensor<64x64x1x1xf32>) -> tensor<1x64x55x55xf32> {
  %0 = "onnx.NoValue"() {value} : () -> none
  %1 = "onnx.LayoutTransform"(%arg0) {target_layout = "NCHW4C"} : (tensor<1x64x55x55xf32>) -> tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>
  %2 = "onnx.LayoutTransform"(%arg1) {target_layout = "KCMN4C4K"} : (tensor<64x64x1x1xf32>) -> tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>
  %3 = "onnx.Shape"(%1) : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>) -> tensor<4xi64>
  %4 = "onnx.Constant"() {value = dense<-1> : tensor<1xi64>} : () -> tensor<1xi64>
  %5 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
  %6 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
  %7 = "onnx.Constant"() {value = dense<2> : tensor<1xi64>} : () -> tensor<1xi64>
  %8 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
  %9 = "onnx.Slice"(%3, %6, %7, %5, %8) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<2xi64>
  %10 = "onnx.Concat"(%9, %4) {axis = 0 : si64} : (tensor<2xi64>, tensor<1xi64>) -> tensor<3xi64>
  %11 = "onnx.Reshape"(%1, %10) {allowzero = 0 : si64} : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>, tensor<3xi64>) -> tensor<1x64x?xf32>
  %12 = "onnx.Shape"(%2) : (tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>) -> tensor<4xi64>
  %13 = "onnx.Constant"() {value = dense<-1> : tensor<1xi64>} : () -> tensor<1xi64>
  %14 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
  %15 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
  %16 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
  %17 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
  %18 = "onnx.Slice"(%12, %15, %16, %14, %17) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<1xi64>
  %19 = "onnx.Concat"(%18, %13) {axis = 0 : si64} : (tensor<1xi64>, tensor<1xi64>) -> tensor<2xi64>
  %20 = "onnx.Reshape"(%2, %19) {allowzero = 0 : si64} : (tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>, tensor<2xi64>) -> tensor<64x?xf32>
  %21 = "onnx.MatMul"(%20, %11) : (tensor<64x?xf32>, tensor<1x64x?xf32>) -> tensor<1x64x?xf32>
  %22 = "onnx.Shape"(%1) : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>) -> tensor<4xi64>
  %23 = "onnx.Shape"(%2) : (tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>) -> tensor<4xi64>
  %24 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
  %25 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
  %26 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
  %27 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
  %28 = "onnx.Slice"(%22, %25, %26, %24, %27) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<1xi64>
  %29 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
  %30 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
  %31 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
  %32 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
  %33 = "onnx.Slice"(%23, %30, %31, %29, %32) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<1xi64>
  %34 = "onnx.Constant"() {value = dense<0> : tensor<1xi64>} : () -> tensor<1xi64>
  %35 = "onnx.Constant"() {value = dense<2> : tensor<1xi64>} : () -> tensor<1xi64>
  %36 = "onnx.Constant"() {value = dense<4> : tensor<1xi64>} : () -> tensor<1xi64>
  %37 = "onnx.Constant"() {value = dense<1> : tensor<1xi64>} : () -> tensor<1xi64>
  %38 = "onnx.Slice"(%22, %35, %36, %34, %37) : (tensor<4xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> tensor<2xi64>
  %39 = "onnx.Concat"(%28, %33, %38) {axis = 0 : si64} : (tensor<1xi64>, tensor<1xi64>, tensor<2xi64>) -> tensor<4xi64>
  %40 = "onnx.Reshape"(%21, %39) {allowzero = 0 : si64} : (tensor<1x64x?xf32>, tensor<4xi64>) -> tensor<1x64x55x55xf32>
// New result type is tensor<1x64x55x55xf32>
  %41 = "onnx.Conv"(%1, %2, %0) {auto_pad = "NOTSET", dilations = [1, 1], group = 1 : si64, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]} : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>, tensor<64x64x1x1xf32, #onnx.encoding<{dataLayout = "KCMN4C4K"}>>, none) -> tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>
// Original result type is tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>
  %42 = "onnx.LayoutTransform"(%41) {target_layout = "STANDARD"} : (tensor<1x64x55x55xf32, #onnx.encoding<{dataLayout = "NCHW4C"}>>) -> tensor<1x64x55x55xf32>
  %43 = "onnx.Conv"(%arg0, %arg1, %0) {auto_pad = "NOTSET", dilations = [1, 1], group = 1 : si64, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]} : (tensor<1x64x55x55xf32>, tensor<64x64x1x1xf32>, none) -> tensor<1x64x55x55xf32>
  return %43 : tensor<1x64x55x55xf32>
}

Oct 19 '22 19:10 MikeHolman

Just tested your fix, and it worked!

Oct 19 '22 20:10 MikeHolman

Jenkins Linux ppc64le Build #7337 [push] Introduction of ONNX lay... started at 20:08

Oct 20 '22 00:10 jenkins-droid

Jenkins Linux s390x Build #8286 [push] Introduction of ONNX lay... started at 20:07

Oct 20 '22 00:10 jenkins-droid

Jenkins Linux amd64 Build #8270 [push] Introduction of ONNX lay... started at 19:07

Oct 20 '22 00:10 jenkins-droid

Jenkins Linux amd64 Build #8270 [push] Introduction of ONNX lay... passed after 1 hr 14 min

Oct 20 '22 01:10 jenkins-droid

Jenkins Linux ppc64le Build #7337 [push] Introduction of ONNX lay... passed after 1 hr 40 min

Oct 20 '22 01:10 jenkins-droid

Jenkins Linux s390x Build #8286 [push] Introduction of ONNX lay... passed after 1 hr 41 min

Oct 20 '22 01:10 jenkins-droid

For ref, the bug was fixed using @tungld 's suggestion pasted below.

About the issue on Windows with your data layout PR, I am thinking there may be an issue in the return type, e.g. Line 204 in ConvOpt.cpp:

Value res = create.onnx.reshape(outputType, MM, outputShapeVals);

I am not sure whether changing to the following can fix the issue:

Value res = create.onnx.reshape(convOp.Y().getType(), MM, outputShapeVals);

It solved the problem.

Oct 20 '22 02:10 AlexandreEichenberger

onnx-mlir onnx-mlir copied to clipboard

Introduction of ONNX layout for SIMD optimizations

onnx-mlir
onnx-mlir copied to clipboard