onnx-mlir [Mhlo] More ops and make Mhlo optional

This pull request:

Add option to the CMakeLists.txt so that Mhlo could be an optional library.
Add target as the parameter for decompose pass.
Add Mhlo lowering for Pow, Sqrt, Mul, MatMul, Conv, Argmax, Flatten, Shape, Transpose, Identity, Slice, Split, Squeeze, Unsqueeze, and Gather.

Aug 15 '22 20:08 Connor-XY

Can one of the admins verify this patch?

Aug 15 '22 20:08 jenkins-droid

@jenkins-droid test this please

Aug 15 '22 20:08 AlexandreEichenberger

Making MHLO optional is very exciting. Thank you!

Aug 15 '22 20:08 sstamenova

Can one of the admins verify this patch?

Aug 15 '22 22:08 jenkins-droid

Can one of the admins verify this patch?

Aug 15 '22 22:08 jenkins-droid

Can one of the admins verify this patch?

Aug 15 '22 22:08 jenkins-droid

Can one of the admins verify this patch?

Aug 15 '22 22:08 jenkins-droid

Can one of the admins verify this patch?

Aug 15 '22 22:08 jenkins-droid

Can one of the admins verify this patch?

Aug 15 '22 22:08 jenkins-droid

@jenkins-droid test this please

Aug 16 '22 07:08 tungld

@Connor-XY ^

Aug 18 '22 16:08 sorenlassen

Can one of the admins verify this patch?

Aug 19 '22 22:08 jenkins-droid

@jenkins-droid test this please

Aug 22 '22 18:08 AlexandreEichenberger

Can one of the admins verify this patch?

Aug 25 '22 23:08 jenkins-droid

Can one of the admins verify this patch?

Aug 25 '22 23:08 jenkins-droid

@jenkins-droid test this please

Aug 26 '22 15:08 AlexandreEichenberger

@AlexandreEichenberger do you think we need to add some milestone model tests here? Ex. resnet 18, bert-tiny.

Aug 30 '22 01:08 yaochengji

@AlexandreEichenberger do you think we need to add some milestone model tests here? Ex. resnet 18, bert-tiny.

We currently test for a few benchmarks as part of our backend tests (run on all of our mandatory CIs): from inference_backend.py:

        "test_densenet121_cpu": {STATIC_SHAPE:{}},
        "test_inception_v1_cpu": {STATIC_SHAPE:{}},
        "test_resnet50_cpu": {STATIC_SHAPE:{}, DYNAMIC_SHAPE:{0:{-1}}},
        "test_shufflenet_cpu": {STATIC_SHAPE:{}},
        "test_squeezenet_cpu": {STATIC_SHAPE:{}},
        "test_vgg19_cpu": {STATIC_SHAPE:{}},

I am not aware on how to add tests there, we need to be a bit careful about not taking too long as some of our CIs may timeout. If you believe that an additional benchmark should be here, I would suggest a distinct PR to add it.

Aug 30 '22 13:08 AlexandreEichenberger

@Connor-XY How difficult would it be to split out the change that makes mhlo optional? It seems like there's no concern about that particular aspect and it would be great to have sooner rather than later.

Sep 06 '22 19:09 sstamenova

@Connor-XY How difficult would it be to split out the change that makes mhlo optional? It seems like there's no concern about that particular aspect and it would be great to have sooner rather than later.

Hi @sstamenova ,

@Connor-XY gets a cold these days. I think making mhlo optional is not difficult.

Sep 07 '22 18:09 yaochengji

Can one of the admins verify this patch?

Sep 07 '22 18:09 jenkins-droid

Can one of the admins verify this patch?

Sep 07 '22 18:09 jenkins-droid

@AlexandreEichenberger I believe it is beneficial if we can reuse IndexExpr, but I currently have some concerns about this. First of all, it seems that in shape inference pass, when dimension is dynamic, all of the IndexExprs are QuestionmarkIndexExpr. Ideally we hope that the final result could be a Value and we generate the code after calling the shape helper so we can reuse the result in the lowering. This is different from what it is currently and I believe this requires substantial changes to the current design. What's more, I currently couldn't figure out a good way to differentiate Mhlo tensor and memref. I think it should be done wrt the scope or the rewriter, but couldn't think of a proper way to do that. We would like to ask for some suggestions about how we should modify IndexExpr so we could reuse your calculation while not affect the proper usage of the structure.

Sep 12 '22 03:09 Connor-XY

all of the IndexExprs are QuestionmarkIndexExpr.

and that is fine. When half the dims are constants, and the other are dynamic, the results will be a mixture of literals and question marks.

final result could be a Value

when doing shape inference, the only interesting outcome (short of symbolic analysis) is either constant lit or unknown. There is no need for values at this time. When lowering code, then we want the literal and/or the computations, all of which can be values. There are methods that transform index expressions in literal (individual or list of).

to differentiate Mhlo tensor and memref.

are MHLO tensors different from the MLIR tensors? I would look at how we use index expression now. During shape inference, for sure we use the tensors. During lowering, we have access to both unconverted inputs (tensors) or converted inputs (members). Not 100% sure which one we use.

I believe we want to avoid redundancy of code during shape inference and lowering. How we get there is less important, though we have invested in IndexExpr and it would be nice to adapt it to everyone's need. We can look together for solutions that works for your needs

Sep 12 '22 16:09 AlexandreEichenberger

when doing shape inference, the only interesting outcome (short of symbolic analysis) is either constant lit or unknown.

@Connor-XY , agreed with @AlexandreEichenberger, it is shape inference, not shape reification, we only need shape calculation for the needs of operations like mhlo.dynamic_reshape.

are MHLO tensors different from the MLIR tensors?

@Connor-XY I don't think we need to take memref into consideration. All the inputs/outputs of MHLO and ONNX dialect are of type tensor.

Sep 13 '22 00:09 yaochengji

Hi @AlexandreEichenberger , currently @Connor-XY 's internship in Bytedance is over so I'll take over this PR.

I have a question when extracting Value from MemRefBoundsIndexCapture or IndexExpr. Ex. when converting onnx flattern op to mhlo, currently the code snippet handling the shape computation is as below:

Value flattenDimFirst = rewriter.create<arith::ConstantIndexOp>(loc, 1);
Value inputShape = rewriter.create<shape::ShapeOfOp>(loc, input);
for (int64_t i = 0; i < axis; i++) {
  Value dim = rewriter.create<shape::GetExtentOp>(loc, inputShape, i);
  flattenDimFirst =
      rewriter.create<shape::MulOp>(loc, flattenDimFirst, dim);
}

After changing to IndexExpr utility, I guess it should be as below:

MemRefBoundsIndexCapture XBounds(op.X());
IndexExpr mulDim = XBounds.get(0);
for (int64_t i = 1; i < axis; i++) {
  mulDim = mulDim * XBounds.get(1);
}

Besides, it seems I should add a correct IndexExprScope and modify the MemRefBoundsIndexCapture::get implementation code to make it works correct. Therefore, I think in the Value extraction scenario IndexExpr doesn't simplify the code very much. Could we keep code like this as it is?

Sep 27 '22 19:09 yaochengji

Can one of the admins verify this patch?

Sep 29 '22 01:09 jenkins-droid

Hi @AlexandreEichenberger, I rebased this branch and fixed some error after it. Could you take a look again?

Sep 29 '22 01:09 yaochengji

Can one of the admins verify this patch?

Sep 29 '22 17:09 jenkins-droid

@jenkins-droid test this please

Sep 29 '22 21:09 philass