dotnet-sdk icon indicating copy to clipboard operation
dotnet-sdk copied to clipboard

Exception from Actor's method is not surfaced to the caller

Open artursouza opened this issue 2 years ago • 8 comments

Expected Behavior

Exception's error message (not stacktrace) from Actor's method is surfaced to the caller.

Actual Behavior

image

Steps to Reproduce the Problem

See example from actor exception.

Release Note

RELEASE NOTE: FIXED issue that was not surfacing actor error to the caller's app

artursouza avatar Jul 08 '22 00:07 artursouza

@amulyavarote @halspang Please, consider ramifications of this change to the runtime as well. We might need to agree on a metadata/header to propagate the error message. In addition, consider truncating error message from the Actor SDK to avoid header size limit issues, as pointed out by @berndverst.

artursouza avatar Jul 11 '22 20:07 artursouza

As an alternative approach, would something like additional logging at the time of error be sufficient here? It seems like the issue is we raise the exception and it's tossed across enough areas that we lose the stack. However, we could just dump the stack trace when we catch the first exception and then it requires no change to the runtime. Thoughts @artursouza?

halspang avatar Jul 11 '22 21:07 halspang

@artursouza can you provide a more detailed explanation? It's not super clear to me from the screenshot what is supposed to be different.

If this is about serializing .NET exception types over the wire, we deliberately avoid that for security reasons. It's very unsafe.

If this is about serializing .NET call stacks over the wire, we deliberately also avoid that for security reasons.

We should make sure the call stacks and exception info are being logged visibly inside the process that throws the exception.

rynowak avatar Jul 11 '22 23:07 rynowak

@rynowak I think the request is essentially that an ExceptionName and Message are returned to Dapr Actor Runtime so that the callee can see what error occurred in the Actor Method. Feels a bit weird, but that is my understanding.

In JSON I sometimes see things like:

{
 "status": "200",
 "errormsg": "Error text here or empty"
 "response": "Response text on success only"
}

Of course changing what the .NET SDK returns upon encountering an exception in the Actor case would break some people.. so this doesn't seem like a good idea.

As long as we enforce a reasonably small message size limit (e.g. 256 characters) it might be ok to simply return the error message via gRPC metadata / http header.

At the end of the day, this all still feels wrong to me. Proper error logging on the actor service implementations should be added, and the error logs should be examined that way.

berndverst avatar Jul 15 '22 23:07 berndverst

Hey @berndverst - thanks for the clarification.

The .NET SDK has two different RPCs for Actors.

  • Using strongly-typed .NET interfaces for your client will use a proprietary C#-only RPC protocol that doesn't work for interop with non-.NET actors (most users do this).
  • Using the weakly-typed client uses the standard Dapr actors protocol and JSON serialization. Most users don't use this, but it is easy to interop in actors written in another language.

The .NET-only RPC protocol already includes the Exception Type and Name. This is specific to .NET and the Dapr .NET SDK provides both ends of the pipe. The RPC and how we transmit these details are effectively an implementation detail of the SDK.

The interoperable protocol leaves this up to the user and we don't have a super-well-defined behavior today. The user can use ASP.NET Core's exception handling features to implement what they like, which isn't great but not blocking either.

For something like what you're suggesting, the error format should be standardized at the Dapr level and designed for interoperability across languages. We try not layer extra opinions on top via the SDK, because if Dapr wants to standardize this later we'd have a painful migration to deal with.

The .NET-only RPC protocol is grandfathered in. It's so clearly specific to .NET that it seems appropriate for us to be opinionated.

rynowak avatar Jul 18 '22 19:07 rynowak

Before we get too far into making a solution, I want to make sure we're covering what the initial issue is.

From the initial conversation, it seemed to me like what they wanted was for the actual line number/stack trace to be given when the called actor has an exception instead of what we see here on the caller side.

@artursouza - Let me know if you think I'm interpreting this incorrectly.

Does this solution offer that? From what Ryan is saying, it sounds like we don't want to send over that much information due to security concerns (which makes sense). Is that correct, @rynowak?

@berndverst - In your solution are you saying we'll just include the method name/exception name in order to avoid sending over secure data? Though I think this could be helpful, I do want to make sure that we are solving what the original ask was.

I know we want to tend towards a more comprehensive solution, but given priorities here, would it not be valid to just log the exception/stack trace on the callee side after the failure? This does have the potential to have the log on a different pod from the caller, but in the case of debugging, that's not hard to track down. If that's OK, we can alleviate the pain quickly and then take more time to plan out the generic (SDK agnostic) solution to actor errors.

halspang avatar Aug 01 '22 23:08 halspang

We should not return stack trace. We should return more info about the error not the trace. If possible, I would include line number and file name too.

artursouza avatar Aug 01 '22 23:08 artursouza

I'm not saying returning, I'm saying printing at the callee side.

Regardless, as long as we have the line number/area of the code that it failed in, I think we're OK. I just wanted to make sure that was being covered. Though I do worry we're rushing into a notoriously finicky codebase.

halspang avatar Aug 01 '22 23:08 halspang