tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Bug] RPC Server Python Exception Can't Be Send to RPC Client

Open Johnson9009 opened this issue 1 year ago • 6 comments

After upgrade with the V0.15.0, we found the RPC have a bug, if the RPC server have some exception, we can't see the error message in RPC client like below.

image

After some investigations, I found only if the exception is raise by a Python packed function will cause this issue, like the below experiment, the C++'s exception is thrown to Python side correctly, then the Python throw it again will lose the error message, the RPC client only can receive the exception but the error message in it is empty. image

image

If the exception is happened in a pure C++ remote packed function, then the error message is correctly sent back to the RPC client.

How to reproduce ? just hard code a raise "xxxxx" in the function begin of load_module in python/tvm/rpc/server.py, or other Python packed function, and then call it in RPC client.

Johnson9009 avatar Mar 12 '24 12:03 Johnson9009

@tqchen @Lunderberg Can you help to see whether it is relevant to the changes of https://github.com/apache/tvm/pull/15596? It is important for us to fix this issue, because Q1 release is coming, thanks.

Johnson9009 avatar Mar 12 '24 12:03 Johnson9009

image Is it relevant to this change?

Johnson9009 avatar Mar 12 '24 13:03 Johnson9009

maybe indeed related to #15596 @Lunderberg seems we need to stringify python errors if they are caught by RPC

tqchen avatar Mar 12 '24 13:03 tqchen

Agreed. The full stack trace is in the python object, so we should be able to serialize it for RPC use. Prior to #15596, the full stack trace was embedded into the string error message, which worked with RPC transfers, but made it quite difficult to track nested levels of error messages.

I'm thinking that RPC's serialization will be the reverse of the parsing that occurs here, so that the stack trace objects can be rebuilt when received. (Though, missing the full python variable information available within a non-RPC err.__traceback__.)

Lunderberg avatar Mar 12 '24 16:03 Lunderberg

@Lunderberg can you help on this one ?

tqchen avatar Mar 19 '24 13:03 tqchen

Thank you for the ping, and I probably can in a few weeks, but am currently low on available time.

Lunderberg avatar Mar 27 '24 13:03 Lunderberg