tvm
tvm copied to clipboard
[Relax][Bug] CodeGenVM cannot handle this intrinsic tensor_to_shape
Actual behavior
Traceback (most recent call last):
File "test.py", line 20, in <module>
ex = relax.build(mod, target='llvm')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/tvm/python/tvm/relax/vm_build.py", line 340, in build
mod = _vmcodegen(builder, mod, exec_mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/tvm/python/tvm/relax/vm_build.py", line 176, in _vmcodegen
return _ffi_api.VMCodeGen(builder, mod) # type:ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
raise_last_ffi_error()
File "/software/tvm/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
10: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_1
9: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::relax::ExecBuilder, tvm::IRModule)>::AssignTypedLambda<tvm::IRModule (*)(tvm::relax::ExecBuilder, tvm::IRModule)>(tvm::IRModule (*)(tvm::relax::ExecBuilder, tvm::IRModule), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
8: tvm::relax::relax_vm::VMCodeGen(tvm::relax::ExecBuilder, tvm::IRModule)
7: tvm::relax::relax_vm::CodeGenVM::Run(tvm::relax::ExecBuilder, tvm::IRModule)
6: tvm::relax::relax_vm::CodeGenVM::Codegen(tvm::relax::Function const&)
5: tvm::relax::ExprFunctor<tvm::runtime::relax_vm::Instruction::Arg (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
4: _ZZN3tvm5relax11ExprFunctorIFNS_7runtime8rel
3: tvm::relax::relax_vm::CodeGenVM::VisitExpr_(tvm::relax::SeqExprNode const*)
2: tvm::relax::ExprFunctor<tvm::runtime::relax_vm::Instruction::Arg (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
1: _ZZN3tvm5relax11ExprFunctorIFNS_7runtime8rel
0: tvm::relax::relax_vm::CodeGenVM::VisitExpr_(tvm::relax::CallNode const*)
File "/software/tvm/src/relax/backend/vm/codegen_vm.cc", line 177
TVMError: CodeGenVM cannot handle this intrinsic now:
Op(relax.tensor_to_shape)
Environment
TVM: 0.17.dev0 OS: Ubuntu20.04
Steps to reproduce
import tvm
from tvm import relax
from tvm.script import ir as I
from tvm.script import relax as R
@I.ir_module
class Module:
@R.function
def main(x: R.Tensor((3,), dtype="int64")) -> R.Tensor((3,), dtype="int64"):
lv: R.Shape([3]) = R.tensor_to_shape(x)
lv: R.Shape([3]) = R.call_pure_packed("vm.builtin.tensor_to_shape", x, sinfo_args=(R.Shape([3]),))
return lv
mod = Module
mod = tvm.relax.transform.LegalizeOps()(mod)
mod.show()
ex = relax.build(mod, target='llvm')
vm = relax.VirtualMachine(ex, tvm.cpu())
Triage
- needs-triage
cc @junrushao
After my investigation, I found if we use the relax.transform.FuseTIR() before the relax.build, the compilation will run well. However, I still haven't figured out why the compilation crash was unexpected and why the script ran well when adding the FuseTIR transform.
A workaround to avoid the bug
import tvm
from tvm import relax
from tvm.script import ir as I
from tvm.script import relax as R
@I.ir_module
class Module:
@R.function
def main(x: R.Tensor((3,), dtype="int64")) -> R.Tensor((3,), dtype="int64"):
lv: R.Shape([3]) = R.tensor_to_shape(x)
lv: R.Shape([3]) = R.call_pure_packed("vm.builtin.tensor_to_shape", x, sinfo_args=(R.Shape([3]),))
return lv
mod = Module
mod = tvm.relax.transform.LegalizeOps()(mod)
mod.show()
mod = relax.transform.FuseTIR()(mod) # newly added
ex = relax.build(mod, target='llvm')
vm = relax.VirtualMachine(ex, tvm.cpu())
I'm baffled. Do we need to explicitly call the ``FuseTIR` transform before compiling any model?
@ysh329 @tqchen @Hzfengsy @junrushao
Another two similar bugs:
Traceback (most recent call last):
File "demo.py", line 140, in <module>
ex = relax.build(mod, target='llvm')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/tvm/python/tvm/relax/vm_build.py", line 340, in build
mod = _vmcodegen(builder, mod, exec_mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/tvm/python/tvm/relax/vm_build.py", line 176, in _vmcodegen
return _ffi_api.VMCodeGen(builder, mod) # type:ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
raise_last_ffi_error()
File "/software/tvm/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
10: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_1
9: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::relax::ExecBuilder, tvm::IRModule)>::AssignTypedLambda<tvm::IRModule (*)(tvm::relax::ExecBuilder, tvm::IRModule)>(tvm::IRModule (*)(tvm::relax::ExecBuilder, tvm::IRModule), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
8: tvm::relax::relax_vm::VMCodeGen(tvm::relax::ExecBuilder, tvm::IRModule)
7: tvm::relax::relax_vm::CodeGenVM::Run(tvm::relax::ExecBuilder, tvm::IRModule)
6: tvm::relax::relax_vm::CodeGenVM::Codegen(tvm::relax::Function const&)
5: tvm::relax::ExprFunctor<tvm::runtime::relax_vm::Instruction::Arg (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
4: _ZZN3tvm5relax11ExprFunctorIFNS_7runtime8rel
3: tvm::relax::relax_vm::CodeGenVM::VisitExpr_(tvm::relax::SeqExprNode const*)
2: tvm::relax::ExprFunctor<tvm::runtime::relax_vm::Instruction::Arg (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
1: _ZZN3tvm5relax11ExprFunctorIFNS_7runtime8rel
0: tvm::relax::relax_vm::CodeGenVM::VisitExpr_(tvm::relax::CallNode const*)
File "/software/tvm/src/relax/backend/vm/codegen_vm.cc", line 177
TVMError: CodeGenVM cannot handle this intrinsic now:
Op(relax.call_tir_with_grad)
...
TVMError: CodeGenVM cannot handle this intrinsic now:
Op(relax.ewise_fma)
There's a few operators that don't have FLegalize implementations, and expect to be lowered/pattern-matched out prior to building. Unfortunately, this results in very hard-to-interpret error messages when the lowering reaches the CodeGenVM step.
For the initial error case, it should be fixed as a side effect of https://github.com/apache/tvm/pull/17218, as it adds a check for R.tensor_to_shape in VMBuiltinLower.
I'm baffled. Do we need to explicitly call the ``FuseTIR` transform before compiling any model?
Looking at the specific example, it looks like there are two distinct lv variables in the input. One is produced by R.tensor_to_shape, while the other is produced by R.call_pure_packed("vm.builtin.tensor_to_shape", ...). When FuseTIR is called, it internally performs dead-code elimination to remove any values that are no longer required after fusion, along with any no-longer-used PrimFunc implementations. This has the side effect of removing the call to R.tensor_to_shape(x), as its output is entirely unused.
@Lunderberg Thanks for your PR and explanation. I got it! BTW, the initial test case can run correctly under the given PR #17218!
I supplemented some similar bugs here:
TVMError: CodeGenVM cannot handle this intrinsic now: Op(relax.builtin.stop_lift_params)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.permute_dims)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.add)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.nn.conv2d)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.wrap_param)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.ewise_fma)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.call_tir_with_grad)
I supplemented some similar bugs here:
TVMError: CodeGenVM cannot handle this intrinsic now: Op(relax.builtin.stop_lift_params)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.permute_dims)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.add)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.nn.conv2d)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.wrap_param)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.ewise_fma)TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.call_tir_with_grad)
LegalizeOps Pass should be able to legalize ops like relax.add.
@yongwww Thanks for your reply. Indeed, In some common cases, LegalizeOps can solve the problem in relax.add. However, for some specialized cases, LegalizeOps is helpless. One such case is shown below. Can you help me review it? Thanks a lot!
import tvm
from tvm import relax
from tvm.script import ir as I
from tvm.script import relax as R
@I.ir_module
class Module:
@R.function
def main_7(t: R.Tuple(R.Tensor, R.Tensor)) -> R.Tensor:
x: R.Tensor = t[0]
y: R.Tensor = t[1]
z: R.Tensor = R.add(x, y)
w: R.Tensor = R.multiply(z, z)
return w
mod = Module
mod = relax.transform.LegalizeOps()(mod)
ex = relax.build(mod, target='llvm')
@Cookiee235 In Relax, the R.Tensor annotation represents a tensor with unknown shape and unknown element type. However, TIR requires both the dimensionality and the element type of a T.Buffer to be known. When relax.transform.LegalizeOps encounters a Relax expression that cannot be expressed in TIR, it is left as-is, even if the operator could otherwise be legalized to TIR.
I've experimented a bit with raising an error from LegalizeOps when it encounters something that cannot be legalized, rather than ignoring it altogether. Unfortunately, there currently isn't any way to distinguish between operations that are handled later (e.g. R.call_tir and R.builtin.alloc_tensor) and operations that must be handled during legalization (e.g. R.add), nor to distinguish between applications of LegalizeOps that may leave operations as-is (custom preprocessing prior to relax.build) and applications of LegalizeOps that must legalize all operations (the use inside relax.build). Both of those extra pieces of information will need to be added in order to raise an error for non-legalizable expressions.