relax icon indicating copy to clipboard operation
relax copied to clipboard

[Bug][VM] Memory corruption over RPC when simply returning a function's argument

Open slyubomirsky opened this issue 2 years ago • 2 comments

This test case works locally but results in weird behavior over RFC:

from __future__ import annotations
import numpy as np
import tvm
from tvm.contrib import utils
import tvm.script
from tvm import relax, rpc
from tvm.script import relax as R

@tvm.script.ir_module
class ReturnSelf:
    @R.function
    def foo(x: Tensor((1,), "int32")) -> Tensor((1,), "int32"):
	return x

def test_over_rpc():
    target = tvm.target.Target("llvm", host="llvm")
    exec = relax.vm.build(ReturnSelf, target)
    temp = utils.tempdir()
    path = temp.relpath("vm_library.so")
    exec.mod.export_library(path)
                                                                                      
    # Adapted from relay/test_vm.py                                                                                               
    def check_remote(server):
	remote = rpc.connect(server.host, server.port, session_timeout=10)
        # Upload the serialized Executable.                                                                                       
        remote.upload(path)
        # Get a handle to remote Executable.                                                                                      
        rexec = remote.load_module("vm_library.so")

        device = remote.cpu()                                                                   
        vm = relax.vm.VirtualMachine(exec=rexec, device=device)

        # Stateful API: This version has integer corruption
        vm.set_input("foo", tvm.nd.array([1], device=device))
        ret = vm["foo"]()
        # Stateless API: This version crashes                                                                                                    
        # func = vm["foo"]                                                                                                        
        # ret = func(tvm.nd.array([1], device=device))                                                                            
        assert int(ret.numpy()[0]) == 1, ret

    check_remote(rpc.Server("127.0.0.1"))

test_over_rpc()

The return value is corrupted (a random integer) when I run using the stateful API and I get a crash when I try it using the stateless API:

  File "/home/slyubomirsky/code/sandbox/return_self_rpc.py", line 48, in <module>
    test_over_rpc()
  File "/home/slyubomirsky/code/sandbox/return_self_rpc.py", line 46, in test_over_rpc
    check_remote(rpc.Server("127.0.0.1"))
  File "/home/slyubomirsky/code/sandbox/return_self_rpc.py", line 43, in check_remote
    ret = func(tvm.nd.array([1], device=device))
  [...]
  0: tvm::runtime::RPCWrappedFunc::WrapRemoteReturnToValue(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/slyubomirsky/code/relax/src/runtime/rpc/rpc_module.cc:289
  File "/home/slyubomirsky/code/relax/src/runtime/rpc/rpc_module.cc", line 289
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------

  Check failed: args.size() == 3 (2 vs. 3) : 

What could possibly be the cause? It's a very simple test case but it's leading to very strange errors.

slyubomirsky avatar Aug 05 '22 05:08 slyubomirsky

For the stateful API, instead of doing vm.set_input("foo", tvm.nd.array([1], device=device)), but defining a variable: a = tvm.nd.array([1], device=device), and then call vm.set_input("foo", a), it will work. Not sure why though. :)

YuchenJin avatar Aug 05 '22 05:08 YuchenJin

Wow!

edit: Maybe it's somehow related to object lifetimes in Python? Maybe having the NDArray inline can result in its being freed somehow? Certainly seems like a bug.

slyubomirsky avatar Aug 05 '22 18:08 slyubomirsky

Work around in #207

tqchen avatar Jan 06 '23 17:01 tqchen