[Query] The refback-munge-calling-conventions pass employed in the backend from linalg to llvm, causes runtime error
I was running the Conv1d_transpose op test case, lowering it to the llvm IR from linalg-from-tensors Dialect. I notice that this test is marked as crashing test case in x_fail test cases https://github.com/llvm/torch-mlir/blob/3180704b1470c047faca5fb64d285cda2a287818/projects/pt1/e2e_testing/xfail_sets.py#L50-L54
I tried to debug this issue. Upon investigating I noticed that there is an following Runtime Error from llvm
"ERROR: Runtime op verification failed\0A%9 = \22memref.cast\22(%arg0) : (memref<*xf32>) -> memref<?x?x?xf32>\0A^ rank mismatch\0ALocation: loc(unknown)"
You can see that the memref is trying to cast an unranked tensor to ranked tensor, which is not allowed in the generate-runtime- verification pass , called here
https://github.com/llvm/torch-mlir/blob/3180704b1470c047faca5fb64d285cda2a287818/projects/pt1/python/torch_mlir_e2e_test/linalg_on_tensors_backends/refbackend.py#L184-L187
The above casting operation is called after refback-munge-calling-conventions which gets rid of the ranked memrefs. My question is why is this pass necessary, this pass will clearly fail runtime verification since there is no way one could know at runtime what input we are getting and what is it's rank. Am I missing here something ? Please let me know
Hi @Abhishek-TyRnT, the support for this was added here https://github.com/llvm/torch-mlir/pull/3615 by @mgehre-amd. Please refer to this PR and ask the author if you have any issues.
Is the question about refback-munge-calling-conventions or generate-runtime- verification?
@mgehre-amd , the question was originally about refback-munge-calling-conventions, but since generate-runtime-verification was added later on, I am curious as to why that pass was added, since refback-munge-calling-conventions will eliminate rank data for the above operation, which generate-runtime-verification pass doesn't allow?
generate-runtime-verification does allow memref.cast from unranked to ranked tensors, but it checks that the rank of the destination matches the dynamic rank of the input tensor (at runtime).
Otherwise the memref.cast op is invalid (see https://mlir.llvm.org/docs/Dialects/MemRef/#memrefcast-memrefcastop), which seems to be happening for this test case.
generate-runtime-verification doesn't complain about other test cases, and those also run refback-munge-calling-conventions.
I think the Conv_Transpose1d lowering needs to be fixed, but I have not looked into the details of it.
generate-runtime-verificationdoes allowmemref.castfrom unranked to ranked tensors, but it checks that the rank of the destination matches the dynamic rank of the input tensor (at runtime).Otherwise the
memref.castop is invalid (see https://mlir.llvm.org/docs/Dialects/MemRef/#memrefcast-memrefcastop), which seems to be happening for this test case.
generate-runtime-verificationdoesn't complain about other test cases, and those also runrefback-munge-calling-conventions.I think the Conv_Transpose1d lowering needs to be fixed, but I have not looked into the details of it.
If the issue is with Conv_Transpose1d lowering then it would be better to file an issue for the same. @Abhishek-TyRnT, if you need that support then maybe file an issue.
I was hoping to fix it myself and send as a contribution,