onnx-mlir
onnx-mlir copied to clipboard
Core dumped when compiling GPT2
./onnx-mlir --EmitONNXBasic /home/justinchu/dev/onnx/gpt2-dataprop.onnx
/usr/include/c++/11/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = mlir::Type; _Alloc = std::allocator<mlir::Type>; std::vector<_Tp, _Alloc>::reference = mlir::Type&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.
I tested with other --Emit options and got the same error.
ONNX Model gpt2-dataprop.zip
Is the model downloaded from onnx model zoo?
It is a torch exported model I experimented with.
I also got core dump on a constant Op. The output type is not correct. I will take a look.
I found the source of error: it's in the TypeInferenceOpInterface implementation for ConstantOp.
I fixed the import issue with PR #2232. But I ran into another error related to customized op:
loc("_aten_native_layer_norm_onnx_54"): error: 'onnx.Custom' op result #1 must be tensor of any type values or memref of any type values, but got 'none'
%84:3 = "onnx.Custom"(%83, %2, %3) {axes = [-1], domain_name = "onnxscript.atenlib", eps = 9.99999974E-6 : f32, function_name = "_aten_native_layer_norm_onnx", onnx_node_name = "_aten_native_layer_norm_onnx_54"} : (tensor<*xf32>, tensor<2xf32>, tensor<2xf32>) -> (tensor<*xf32>, none, none)
I will fix this bug.
I fixed the NoneType problem for customOp. But there is another error in shape inference caused by
%42 = onnx.Constant {onnx_node_name = "Constant_12", value_ints = [-1]} : tensor<1xi64>
I remember I wrote a normalization of constant Op. But the code cannot be found anywhere.
ConstantOp may have attribute other than value_sparse or value_value, for instant value_int, value_ints and etc. I remember I wrote a bunch of transformation rules to normalize those attribute to value_value attribute (DenseElementsAttr). This onnx model has a Constant with value_ints, and shape inference pass still ran into this attribute and got an assertion error. @sorenlassen Since you worked a lot on the constant, you may know where the constant normalization code is.
in the lit test onnx_canonicalization.mlir we have a value_ints example: https://github.com/onnx/onnx-mlir/blob/main/test/mlir/onnx/onnx_canonicalization.mlir#LL627-L635 if I put that in a standalone test_constant_3.mlir file:
func.func @test_constant_3() -> tensor<3xi64> {
%0 = onnx.Constant {value_ints = [1, 2, 3] } : tensor<3xi64>
return %0 : tensor<3xi64>
}
then it succeeds if I run onnx-mlir-opt --canonicalize --shape-inference test_constant_3.mlir
but fails if I omit --canonicalize
maybe we should canonicalize before the first run of shape inference
I checked that the following change doesn't break any lit tests:
diff --git a/src/Compiler/CompilerPasses.cpp b/src/Compiler/CompilerPasses.cpp
index 5bcf5c70..7e0cd950 100644
--- a/src/Compiler/CompilerPasses.cpp
+++ b/src/Compiler/CompilerPasses.cpp
@@ -57,6 +57,7 @@ void addONNXToMLIRPasses(mlir::PassManager &pm, bool targetCPU) {
std::make_unique<DisposableGarbageCollector>(pm.getContext()));
pm.addNestedPass<func::FuncOp>(onnx_mlir::createDecomposeONNXToONNXPass());
+ pm.addPass(mlir::createCanonicalizerPass());
if (enableONNXHybridPass) {
// For starters only illustrating the new hybrid pass by replacing 3 passes
// here. The plan is to replace most of the passes in addONNXToMLIRPasses.
but I haven't tested it on the gpt2 model
I tried adding canonicalizing before shape inference with the patch in the previous message, on top of the fixes in PR #2232 and now onnx-mlir gpt2-dataprop.onnx
fails with these messages:
loc("Slice_108"): error: Axes must be known at compile time
loc("Slice_108"): error: Failed to scan parameters successfully
loc("Slice_108"): error: shape inference failed
Thanks. Pass ordering problem. Will we have the problem in the new hybrid transformation? I will try to add a canonicalization pass before shape inference.
Will we have the problem in the new hybrid transformation?
good question
onnx-mlir --onnx-hybrid-pass gpt2-dataprop.onnx
doesn't crash, even without the extra canonicalizer pass, but prints
loc("Constant_12"): error: Require exactly one of the two attributes, either value or sparse_value
loc("Constant_12"): error: 'onnx.Constant' op shape inference failed
the hybrid pass infers shapes before canonicalization: https://github.com/onnx/onnx-mlir/blob/main/src/Transform/ONNX/ShapeInference.cpp#L74-L79 which might be the wrong thing to do for this example
I tried adding canonicalizing before shape inference with the patch in the previous message, on top of the fixes in PR #2232 and now
onnx-mlir gpt2-dataprop.onnx
fails with these messages:loc("Slice_108"): error: Axes must be known at compile time loc("Slice_108"): error: Failed to scan parameters successfully loc("Slice_108"): error: shape inference failed
I got this error if I also added canonicalization pass (in PR#2232) as you did. We can add the support of dynamic axes for Slice. The output shape will be Tensor<?x?x...?xT>.
@sorenlassen By the way, this model has lots of custom Ops. We can use it as a test case.
Adding a canonicalization pass caused the numerical test to fail. The error message is
error: type of return operand 0 ('tensor<?x?x1x5xf32>') doesn't match function result type ('tensor<*xf32>') in function @main_graph
Function type issue again? With further investigation needed, I just rollback my change in that PR.
(base) ➜ bin git:(main) ✗ ./onnx-mlir --EmitONNXIR /home/justinchu/dev/onnx-mlir/test_fx_to_onnx_with_onnxruntime.TestFxToOnnxWithOnnxRuntime_op_level_debug_True_dynamic_shapes_True.test_gpt2_tiny_from_config.onnx
[1] 312747 segmentation fault (core dumped) ./onnx-mlir --EmitONNXIR
(base) ➜ bin git:(main) ✗ ./onnx-mlir /home/justinchu/dev/onnx-mlir/test_fx_to_on
nx_with_onnxruntime.TestFxToOnnxWithOnnxRuntime_op_level_debug_True_dynamic_shapes_True.test_gpt2_tiny_from_config.onnx
[1] 312898 segmentation fault (core dumped) ./onnx-mlir
This is going to be the type of models created by the new PyTorch 2.1 so just heads up