relax
relax copied to clipboard
[Bug][VM] Memory corruption over RPC when simply returning a function's argument
This test case works locally but results in weird behavior over RFC:
from __future__ import annotations
import numpy as np
import tvm
from tvm.contrib import utils
import tvm.script
from tvm import relax, rpc
from tvm.script import relax as R
@tvm.script.ir_module
class ReturnSelf:
@R.function
def foo(x: Tensor((1,), "int32")) -> Tensor((1,), "int32"):
return x
def test_over_rpc():
target = tvm.target.Target("llvm", host="llvm")
exec = relax.vm.build(ReturnSelf, target)
temp = utils.tempdir()
path = temp.relpath("vm_library.so")
exec.mod.export_library(path)
# Adapted from relay/test_vm.py
def check_remote(server):
remote = rpc.connect(server.host, server.port, session_timeout=10)
# Upload the serialized Executable.
remote.upload(path)
# Get a handle to remote Executable.
rexec = remote.load_module("vm_library.so")
device = remote.cpu()
vm = relax.vm.VirtualMachine(exec=rexec, device=device)
# Stateful API: This version has integer corruption
vm.set_input("foo", tvm.nd.array([1], device=device))
ret = vm["foo"]()
# Stateless API: This version crashes
# func = vm["foo"]
# ret = func(tvm.nd.array([1], device=device))
assert int(ret.numpy()[0]) == 1, ret
check_remote(rpc.Server("127.0.0.1"))
test_over_rpc()
The return value is corrupted (a random integer) when I run using the stateful API and I get a crash when I try it using the stateless API:
File "/home/slyubomirsky/code/sandbox/return_self_rpc.py", line 48, in <module>
test_over_rpc()
File "/home/slyubomirsky/code/sandbox/return_self_rpc.py", line 46, in test_over_rpc
check_remote(rpc.Server("127.0.0.1"))
File "/home/slyubomirsky/code/sandbox/return_self_rpc.py", line 43, in check_remote
ret = func(tvm.nd.array([1], device=device))
[...]
0: tvm::runtime::RPCWrappedFunc::WrapRemoteReturnToValue(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /home/slyubomirsky/code/relax/src/runtime/rpc/rpc_module.cc:289
File "/home/slyubomirsky/code/relax/src/runtime/rpc/rpc_module.cc", line 289
TVMError:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
Check failed: args.size() == 3 (2 vs. 3) :
What could possibly be the cause? It's a very simple test case but it's leading to very strange errors.
For the stateful API, instead of doing vm.set_input("foo", tvm.nd.array([1], device=device))
, but defining a variable:
a = tvm.nd.array([1], device=device)
, and then call vm.set_input("foo", a)
, it will work. Not sure why though. :)
Wow!
edit: Maybe it's somehow related to object lifetimes in Python? Maybe having the NDArray inline can result in its being freed somehow? Certainly seems like a bug.
Work around in #207