tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Relax][Bug] CodeGenVM cannot handle this intrinsic tensor_to_shape

Open Cookiee235 opened this issue 1 year ago • 9 comments

Actual behavior

Traceback (most recent call last):
  File "test.py", line 20, in <module>
    ex = relax.build(mod, target='llvm')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/software/tvm/python/tvm/relax/vm_build.py", line 340, in build
    mod = _vmcodegen(builder, mod, exec_mode)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/software/tvm/python/tvm/relax/vm_build.py", line 176, in _vmcodegen
    return _ffi_api.VMCodeGen(builder, mod)  # type:ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/software/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/software/tvm/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
  10: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_1
  9: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::relax::ExecBuilder, tvm::IRModule)>::AssignTypedLambda<tvm::IRModule (*)(tvm::relax::ExecBuilder, tvm::IRModule)>(tvm::IRModule (*)(tvm::relax::ExecBuilder, tvm::IRModule), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  8: tvm::relax::relax_vm::VMCodeGen(tvm::relax::ExecBuilder, tvm::IRModule)
  7: tvm::relax::relax_vm::CodeGenVM::Run(tvm::relax::ExecBuilder, tvm::IRModule)
  6: tvm::relax::relax_vm::CodeGenVM::Codegen(tvm::relax::Function const&)
  5: tvm::relax::ExprFunctor<tvm::runtime::relax_vm::Instruction::Arg (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
  4: _ZZN3tvm5relax11ExprFunctorIFNS_7runtime8rel
  3: tvm::relax::relax_vm::CodeGenVM::VisitExpr_(tvm::relax::SeqExprNode const*)
  2: tvm::relax::ExprFunctor<tvm::runtime::relax_vm::Instruction::Arg (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
  1: _ZZN3tvm5relax11ExprFunctorIFNS_7runtime8rel
  0: tvm::relax::relax_vm::CodeGenVM::VisitExpr_(tvm::relax::CallNode const*)
  File "/software/tvm/src/relax/backend/vm/codegen_vm.cc", line 177
TVMError: CodeGenVM cannot handle this intrinsic now:
Op(relax.tensor_to_shape)

Environment

TVM: 0.17.dev0 OS: Ubuntu20.04

Steps to reproduce

import tvm
from tvm import relax
from tvm.script import ir as I
from tvm.script import relax as R

@I.ir_module
class Module:
    @R.function
    def main(x: R.Tensor((3,), dtype="int64")) -> R.Tensor((3,), dtype="int64"):
        lv: R.Shape([3]) = R.tensor_to_shape(x)
        lv: R.Shape([3]) = R.call_pure_packed("vm.builtin.tensor_to_shape", x, sinfo_args=(R.Shape([3]),))
        return lv

mod = Module
mod = tvm.relax.transform.LegalizeOps()(mod)
mod.show()

ex = relax.build(mod, target='llvm')
vm = relax.VirtualMachine(ex, tvm.cpu())

Triage

  • needs-triage

cc @junrushao

Cookiee235 avatar Jul 18 '24 15:07 Cookiee235

After my investigation, I found if we use the relax.transform.FuseTIR() before the relax.build, the compilation will run well. However, I still haven't figured out why the compilation crash was unexpected and why the script ran well when adding the FuseTIR transform.

A workaround to avoid the bug

import tvm
from tvm import relax
from tvm.script import ir as I
from tvm.script import relax as R

@I.ir_module
class Module:
    @R.function
    def main(x: R.Tensor((3,), dtype="int64")) -> R.Tensor((3,), dtype="int64"):
        lv: R.Shape([3]) = R.tensor_to_shape(x)
        lv: R.Shape([3]) = R.call_pure_packed("vm.builtin.tensor_to_shape", x, sinfo_args=(R.Shape([3]),))
        return lv

mod = Module
mod = tvm.relax.transform.LegalizeOps()(mod)
mod.show()
mod = relax.transform.FuseTIR()(mod)    # newly added
ex = relax.build(mod, target='llvm')
vm = relax.VirtualMachine(ex, tvm.cpu())

I'm baffled. Do we need to explicitly call the ``FuseTIR` transform before compiling any model?

@ysh329 @tqchen @Hzfengsy @junrushao

Cookiee235 avatar Jul 18 '24 15:07 Cookiee235

Another two similar bugs:

Traceback (most recent call last):
  File "demo.py", line 140, in <module>
    ex = relax.build(mod, target='llvm')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/software/tvm/python/tvm/relax/vm_build.py", line 340, in build
    mod = _vmcodegen(builder, mod, exec_mode)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/software/tvm/python/tvm/relax/vm_build.py", line 176, in _vmcodegen
    return _ffi_api.VMCodeGen(builder, mod)  # type:ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/software/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/software/tvm/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
  10: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_1
  9: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::relax::ExecBuilder, tvm::IRModule)>::AssignTypedLambda<tvm::IRModule (*)(tvm::relax::ExecBuilder, tvm::IRModule)>(tvm::IRModule (*)(tvm::relax::ExecBuilder, tvm::IRModule), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  8: tvm::relax::relax_vm::VMCodeGen(tvm::relax::ExecBuilder, tvm::IRModule)
  7: tvm::relax::relax_vm::CodeGenVM::Run(tvm::relax::ExecBuilder, tvm::IRModule)
  6: tvm::relax::relax_vm::CodeGenVM::Codegen(tvm::relax::Function const&)
  5: tvm::relax::ExprFunctor<tvm::runtime::relax_vm::Instruction::Arg (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
  4: _ZZN3tvm5relax11ExprFunctorIFNS_7runtime8rel
  3: tvm::relax::relax_vm::CodeGenVM::VisitExpr_(tvm::relax::SeqExprNode const*)
  2: tvm::relax::ExprFunctor<tvm::runtime::relax_vm::Instruction::Arg (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
  1: _ZZN3tvm5relax11ExprFunctorIFNS_7runtime8rel
  0: tvm::relax::relax_vm::CodeGenVM::VisitExpr_(tvm::relax::CallNode const*)
  File "/software/tvm/src/relax/backend/vm/codegen_vm.cc", line 177
TVMError: CodeGenVM cannot handle this intrinsic now:
Op(relax.call_tir_with_grad)

...
TVMError: CodeGenVM cannot handle this intrinsic now:
Op(relax.ewise_fma)

Cookiee235 avatar Jul 26 '24 15:07 Cookiee235

There's a few operators that don't have FLegalize implementations, and expect to be lowered/pattern-matched out prior to building. Unfortunately, this results in very hard-to-interpret error messages when the lowering reaches the CodeGenVM step.

For the initial error case, it should be fixed as a side effect of https://github.com/apache/tvm/pull/17218, as it adds a check for R.tensor_to_shape in VMBuiltinLower.

Lunderberg avatar Jul 30 '24 20:07 Lunderberg

I'm baffled. Do we need to explicitly call the ``FuseTIR` transform before compiling any model?

Looking at the specific example, it looks like there are two distinct lv variables in the input. One is produced by R.tensor_to_shape, while the other is produced by R.call_pure_packed("vm.builtin.tensor_to_shape", ...). When FuseTIR is called, it internally performs dead-code elimination to remove any values that are no longer required after fusion, along with any no-longer-used PrimFunc implementations. This has the side effect of removing the call to R.tensor_to_shape(x), as its output is entirely unused.

Lunderberg avatar Jul 30 '24 20:07 Lunderberg

@Lunderberg Thanks for your PR and explanation. I got it! BTW, the initial test case can run correctly under the given PR #17218!

Cookiee235 avatar Jul 31 '24 12:07 Cookiee235

I supplemented some similar bugs here:

  • TVMError: CodeGenVM cannot handle this intrinsic now: Op(relax.builtin.stop_lift_params)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.permute_dims)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.add)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.nn.conv2d)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.wrap_param)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.ewise_fma)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.call_tir_with_grad)

Cookiee235 avatar Sep 09 '24 14:09 Cookiee235

I supplemented some similar bugs here:

  • TVMError: CodeGenVM cannot handle this intrinsic now: Op(relax.builtin.stop_lift_params)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.permute_dims)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.add)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.nn.conv2d)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.wrap_param)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.ewise_fma)
  • TVMError: CodeGenVM cannot handle this intrinsic now:Op(relax.call_tir_with_grad)

LegalizeOps Pass should be able to legalize ops like relax.add.

yongwww avatar Sep 09 '24 17:09 yongwww

@yongwww Thanks for your reply. Indeed, In some common cases, LegalizeOps can solve the problem in relax.add. However, for some specialized cases, LegalizeOps is helpless. One such case is shown below. Can you help me review it? Thanks a lot!

import tvm
from tvm import relax

from tvm.script import ir as I
from tvm.script import relax as R

@I.ir_module
class Module:
    @R.function
    def main_7(t: R.Tuple(R.Tensor, R.Tensor)) -> R.Tensor:
        x: R.Tensor = t[0]
        y: R.Tensor = t[1]
        z: R.Tensor = R.add(x, y)
        w: R.Tensor = R.multiply(z, z)
        return w

mod = Module
mod = relax.transform.LegalizeOps()(mod)
ex = relax.build(mod, target='llvm')

Cookiee235 avatar Sep 10 '24 01:09 Cookiee235

@Cookiee235 In Relax, the R.Tensor annotation represents a tensor with unknown shape and unknown element type. However, TIR requires both the dimensionality and the element type of a T.Buffer to be known. When relax.transform.LegalizeOps encounters a Relax expression that cannot be expressed in TIR, it is left as-is, even if the operator could otherwise be legalized to TIR.

I've experimented a bit with raising an error from LegalizeOps when it encounters something that cannot be legalized, rather than ignoring it altogether. Unfortunately, there currently isn't any way to distinguish between operations that are handled later (e.g. R.call_tir and R.builtin.alloc_tensor) and operations that must be handled during legalization (e.g. R.add), nor to distinguish between applications of LegalizeOps that may leave operations as-is (custom preprocessing prior to relax.build) and applications of LegalizeOps that must legalize all operations (the use inside relax.build). Both of those extra pieces of information will need to be added in order to raise an error for non-legalizable expressions.

Lunderberg avatar Sep 10 '24 13:09 Lunderberg