Remoting and an exception as a payload message
Akka.Remoting and Akka.Cluster implement a transparent IPC system. The supervisor system is relying on exception types as it's message protocol. This works fine in a single system but with combination of remote-deploy it leads to problems.
Because not all exceptions and there fields are serializable and/or the exception-type is known to every cluster node, so the exception as a message can't be always transmitted. And it is not known for the sender or receiver what Exception-Types will be generated or can be handled.
To resolve this conflict, i see following options:
Option A
To transform every exception handled by the supervisor-strategy to an well-known generic "ErrorMessage" and/or "ReasonMessage" and add an Extension method to transform an ErrorMessage back to an exception in user code. This acts as a substitution, the exception-reference itself can be held internally, but will not be serialized.
override void PostRestart(Akka.ErrorReason error)
{
//Reason contains a short message as string and a ErrorId as a guid
Akka.Reason reason = error.Reason;
//The error instance can contain the exception instance or it can TRY to lazy deserialize it.
bool hasException = error.TryGetException(Exception ex);
if(hasException)
PostRestart(ex);
else
PostRestart(new AkkaReasonException(reason)); //generic error
}
With this default implementation the PostRestart(Exception error) is still be called and nothing should breaks.
Option B
Akka.Remoting is treating an exception as a special-message. Then on the receiving side, it will only try to deserialize it back. If it fails, then it will generate a GenericException instead.
Related: https://github.com/akkadotnet/akka.net/issues/3811
Agree that this is a real issue that needs addressing. Slated it for the v1.4.0 release.
So I stumbled on this issue because of a problem I encountered when trying to return a Status.Failure with an exception from a remote actor. After some investigation I found that Newtonsoft is quite capable of serializing and deserializing an exception on it's own...but somewhere in Akka some value properties on an Exception are treated as an object.
Regular serialization
"RemoteStackIndex":0,
"HResult":-2146233088,
Akka produces
"RemoteStackIndex":{
"$":"I0"
},
"HResult":{
"$":"I-2146233088"
},
I don't recognize the json produced by Akka, is this some special value type setting that is used by Akka when sending and not understood when the receiving side tries to deserialize it?
https://github.com/akkadotnet/akka.net/blob/523e15c43c3927a8e14c34bd4647afdb7bf14dac/src/core/Akka/Serialization/NewtonSoftJsonSerializer.cs#L237
Looks like this bit of code is causing havoc on exception (de)serialization. Any way to bypass it for exceptions?
also stumbled on this now. We want to send messages, encapsulating exceptions to a remote node in failure cases and this issue seems the root cause that it fails.
Before we think about workarounds: Is this still on the roadmap for a 1.4.x release (using the NewtonJsonSerializer)? Thanks for a short feedback.
The issue has nothing to do with prober serialization/deserialization
If cluster nodes do not have the same binaries and one node tries to send an unknown exception type back then the cluster will disintegrate.
The issue has nothing to do with prober serialization/deserialization
If cluster nodes do not have the same binaries and one node tries to send an unknown exception type back then the cluster will disintegrate.
That's not true, even a standard argument null exception doesn't make it across. I've fixed it in the past by creating my own serializer. Although nowadays I would say maybe you shouldn't send exceptions over the wire... but a user is kind of 'forced' by akka.net due to the Status.Failure class asking for an exception as payload.
you shouldn't send exceptions over the wire
This is exactly the issue and we need to change it.
Many libs made this misstake one of them was System.Runtime.Remoting
The issue has nothing to do with prober serialization/deserialization If cluster nodes do not have the same binaries and one node tries to send an unknown exception type back then the cluster will disintegrate.
That's not true, even a standard argument null exception doesn't make it across.
Exactly.
I can see some issues sending exceptions over the wire but I would not say that it is true for all applications that this (sending exceptions over the write) is an anti-pattern. Else, I would be interested in the reasoning behind that.
It is the combination of .) polymorphism of exceptions .) unknown type of raised exception .) the strict requirement of akka.remote to deserialize every object
Example:
- NodeA has user libs and forcefully includes "EF Core 3.1" libs only for exceptions
- NodeB has user libs and uses "EF Core 3.1" to query something on NodeA request
- Some exception is thrown internaly in "EF Core 3.1" and send back to NodeA
- NodeA has forcefully the "EF Core 3.1" libs included and can deserialize the EF exception. ....
- NodeB updates to "EF Core 5.0"
- NodeA makes a request as usual
- NodeB throws a new exception "BadIndexException" (not real)
- NodeA tries to deserialize, it cant and disassociate with NodeB => NodeB need to be restartet
Thanks. Yes, especially the "dependency/inner exception issue" of exceptions I also had in mind.
Nevertheless, I think sending a well defined set of (application-definied-)exceptions over the wire is still a valid usecase.
Also, as was it already mentioned in this issue, the Failure "request" in the akka project itself requests an exception and can potentially be sent to a remote node. So it should be technically workable imho
https://github.com/akkadotnet/akka.net/blob/523e15c43c3927a8e14c34bd4647afdb7bf14dac/src/core/Akka/Serialization/NewtonSoftJsonSerializer.cs#L237
Looks like this bit of code is causing havoc on exception (de)serialization. Any way to bypass it for exceptions?
You can always subclass the NewtonsoftJsonSerializer to bypass it. Maybe we can expose a Setup that will allow users to pass in custom converters and other types of programmatic configuration elements for the default JSON serializer. We already allow the SerializerSetup in Akka.NET which can specify serialization bindings programmatically.
Yes, such a possibility to inject custom converters would be great!
Maybe worth a mention... this doesn't seem to be a problem with Hyperion
Yes, such a possibility to inject custom converters would be great!
Alright - this will also allow us to add some programmatic filtering for security purposes in our default polymorphic serializers (Hyperion, JSON.NET) while we're at it. We'll look into it.
This change is now available as of the most recent Akka.NET Nightly builds and will be released as part of Akka.NET v1.4.19, per https://github.com/akkadotnet/akka.net/issues/4877
(the newtonsoft.json change, that is)
Do we have an DTO for errors / failures?
I am currently using internally bellow types.
There are Helper methods ToError(this PNetReason) and ToReason(this Exception)
to map from Exception to Reason and back.
Internally IMemoryCache is used to store and map the exception instance.
The node where the exception is thrown, will tag it with an ErrorId and log it. After only the Reason is send back to the caller. If the caller is on the same node an requires the original exception, it can be resolved back over the ErrorId
The Reason can be used in failure responses. "NotFound", "Timeout", ...
var reason = exception.ToReason();
//this would be a welcome change
var status = new Status.Failure(Status: "some_operation", Reason: ex.ToReason());
public sealed class PNetReason : IEquatable<PNetReason>
{
public static implicit operator Exception(PNetReason reason)
{
return reason.ToError();
}
public static bool operator ==(PNetReason left, PNetReason right)
{
return EqualityComparer<PNetReason>.Default.Equals(left, right);
}
public static bool operator !=(PNetReason left, PNetReason right)
{
return !(left == right);
}
public static readonly PNetReason Timeout = new PNetReason("Timeout");
public static readonly PNetReason None = new PNetReason(PNetNames.None);
public static readonly PNetReason Unknown = new PNetReason("Unknown");
public static readonly PNetReason Cancelled = new PNetReason("Cancelled");
public static readonly PNetReason NotSupported = new PNetReason("NotSupported");
public string Message { get; private set; }
public PNetErrorIdentity Error { get; private set; }
public PNetReason()
{
}
public PNetReason(string message)
{
Message = message;
}
public PNetReason(PNetErrorIdentity error)
{
Error = error;
}
public PNetReason(string message, PNetErrorIdentity error)
{
Message = message;
Error = error;
}
public override string ToString()
{
if (!string.IsNullOrEmpty(Message))
return Message;
if (Error != null)
return Error.ToString();
return string.Empty;
}
public override bool Equals(object obj)
{
return obj is PNetReason v && Equals(v);
}
public bool Equals(PNetReason other)
{
return other != null && Error == other.Error
&& StringComparer.OrdinalIgnoreCase.Equals(Message, other.Message);
}
public static bool Equals(PNetReason a, PNetReason b)
{
return (a ?? None).Equals(b);
}
public static PNetReason ItemNotFound(string name, object value)
{
//maybe move message generation out
return new PNetReason($"{name} '{value}' not found");
}
public static PNetReason ItemUnknown(string name, object value)
{
//maybe move message generation out
return new PNetReason($"{name} '{value}' unknown");
}
public static PNetReason InvalidOperation(object state)
{
//maybe move message generation out
return new PNetReason($"Operation invalid for state '{state}'");
}
public override int GetHashCode()
{
return HashCode.Combine(Message, Error);
}
}
public sealed class PNetErrorIdentity : PNetIdentity
{
public static new readonly PNetErrorIdentity None = new PNetErrorIdentity
{
Id = Guid.Empty,
//Type = PNetNames.None
};
}
Do we have an DTO for errors / failures?
Not yet - still need a serialization solution for that.
Looking into this per @Zetanova 's comments on https://github.com/akkadotnet/akka.net/pull/6294#discussion_r1042100923
It looks like we already have remote Exception serialization support for remotely deployed actors via ExceptionData and the ExceptionSupport class in Akka.Remote:
https://github.com/akkadotnet/akka.net/blob/1b96294db2de1a85e670e3829c7c93a1f7a93687/src/core/Akka.Remote/Serialization/ExceptionSupport.cs#L21-L155
I'll check the test suite to see if we have tests covering this scenario for remote deployments. We'll also add support for Status.Failed and Status.Succeeded provided that those are feasible.