tvm
tvm copied to clipboard
[Bug] Runtime Cross Compile Broken After The FFI Refactor
In Arm China, we are migrating the old work to the newest code, recently the FFI refactor introduce lots of changes in very low level infrastructure, now only the cython way is supported, there isn't ctypes, when cross compile the AArch64 TVM runtime for RPC server on X86 machine, I found:
- set USE_LIBBACKTRACE to OFF in config.cmake don't work
- the
core.cpython-xxx.soisn't built automatically - excute the "python3 setup.py built_ext" command manually will create a X86
core.cpython-xxx.so
Is there any way now can do the cross compile for the core.cpython-xxx.so?
The new flag is TVM_FFI_USE_LIBBACKTRACE which can be turned off, we will need to run the cython build part only on the aarch64
@tqchen Thanks for the reply, yesterday I already find and use the new flag, just report the phenomenon, and I don't know if these two flags all are used and have different effect, if the old one is deprecated, then I think we can update the config.cmake.
About the cross build of tvm_cython, we can call the compile commands manually, later to investigate deeper how to do it through CMake more automatically.
Another unreasonable phenomenon is the target tvm_cython dependent only on target tvm, I think it should be a post operation for both target tvm and target tvm_runtime, for the scenario that deploy Python version RPC server, we only need to build tvm_runtime, but still need the result of tvm_cython, in this scenario the target tvm shouldn't be built.
By the way, the new FFI still can't support pass runtime::Array to RPC server?
Agree that we can make the Cython a post op, please send a PR. The new RPC still have limited behavior as older version, but one thing that we can do now is to have remote opaque object, so in theory we can call remote constructor to construct an array that can then be passed
Sorry, these day have lots of customers' work,maybe next week I can have some time to send patch.
After 2 weeks hard work we have finish the merge of the latest upstream code, even though there is lots of work to do for finishing migration of all our graph level work on Relay, but we think it is worth, Relax is well designed and faster than Relay, the new ffi is powerful now we can use Array in more place, certainly it still need time to study more details about the new ffi.
About the Array can't pass through RPC, I understand the method you said, if the construction is handled by RPC automatically it is good, if need handled by user, I think user will perfer the current unpack style. After go through code and think deeper, I understand the different of the container like Array, Map compared to the objects that can be pass through RPC.
Last thing I want report is the NDArray will become DLTensor* when it is pass to the RPC server,this behavior cause the arguments of my runtime module functions is different between RPC run scenario and local run scenario. After go through code I understand details of each step in RPC run scenario, in theory,we can construct a unowned NDArray when RPC server received a client side created NDArray?