rabbitmq-dotnet-client icon indicating copy to clipboard operation
rabbitmq-dotnet-client copied to clipboard

AutorecoveringConnection prevents AppDomain.Unload() of enclosing AppDomain

Open ghost opened this issue 5 years ago • 10 comments
trafficstars

Abstract

An active RabbitMQ.Client.Framing.Impl.AutorecoveringConnection instance violently opposes any attempt by its enclosing System.AppDomain to unload itself, resulting an a thrown System.CannotUnloadDomainException and a subsequent crash of the containing OS process.

Details

Closer analysis shows that the RabbitMQ.Client.Framing.Impl.Connection instance embedded in an AutorecoveringConnection will experience the call to AppDomain.Unload(...) as a System.Threading.ThreadAbortException inside its MainLoop method, specifically in the call to InboundFrame inboundFrame = _frameHandler.ReadFrame(). Because AutorecoveringConnection generally tries to recover from Thread aborts, it will proceed to enter its automatic recovery procedure. Thus, an active Connection will be restored, and in all likelyhood the persisting reference to an unmanaged resource - namely a network Socket - is what prevents the AppDomain from unloading in an orderly manner.

All of the above was observed while running RabbitMQ.Client 6.0.0 in the .NET 4.6.1 variant inside a WebService in Microsoft Internet Information Server on Windows Server 2016.

Apparently, previous versions of RabbitMQ.Client exposed a means of subscribing Connection instances to the AppDomain.DomainUnload Event via the Connection.HandleDomainUnload(...) method, but this method is no longer publicly accessible.

Relevance

The observed behavior is a relevant problem because IIS is in the habit of maintaining multiple AppDomain instances inside an Application Pool and recycling these AppDomains on various occasions, e.g. changes to the web.config, which will currently result in a crash of the associated Application Pool. This hurts scenarios where IIS is used as a bridging technology from WebServices to RabbitMQ.

Please consider modifying the behavior of the AutorecoveringConnection to explicitly handle AppDomain unload scenarios.

Workaround

In the present state of affairs, the System.CannotUnloadDomainException can be avoided by disposing all active AutorecoveringConnection instances from within an explicit AppDomain.DomainUnload Event handler.

Repro

The code snippet attached below will raise the CannotUnloadDomainException on AppDomain.DomainUnload(...) unless called with the useWorkaround parameter.

using System;
using System.Net;
using System.Reflection;
using System.Threading;
using RabbitMQ.Client;

namespace RabbitMQClientCrashOnAppDomainUnload
{
    internal class Program
    {
        // May have to change this to relevant local port.
        private const int MyAmqpsPort = 5674;

        public class Worker : MarshalByRefObject
        {
            public void Execute(string virtualHost, string userName, string password, bool useWorkaround)
            {
                var connectionFactory = new ConnectionFactory() { UserName = userName, Password = password, VirtualHost = virtualHost };

                var hostName = Dns.GetHostEntry(string.Empty).HostName;

                Console.WriteLine($"Connecting to VirtualHost '{virtualHost}' on host '{hostName}' as user '{userName}'");
                var connection = connectionFactory.CreateConnection(new[] { new AmqpTcpEndpoint(hostName, MyAmqpsPort) });

                // BEGIN Workaround
                if (useWorkaround)
                    AppDomain.CurrentDomain.DomainUnload += (_, __) => connection.Dispose();
                // END Workaround

                while (true)
                {
                    Console.Write(".");
                    Thread.Sleep(TimeSpan.FromMilliseconds(500));
                }
            }
        }

        private static void Main(string[] args)
        {
            if (args.Length < 3 || args.Length > 4)
            {
                Console.Error.WriteLine($"Usage: {Assembly.GetExecutingAssembly().GetName().Name} <virtualHost> <userName> <password> [\"true\"]");
                return;
            }

            var (virtualHost, userName, password) = (args[0], args[1], args[2]);
            var useWorkaround = args.Length == 4 && bool.Parse(args[3]);

            var childDomain = AppDomain.CreateDomain("Child");

            var workerProxy = (Worker)childDomain.CreateInstanceAndUnwrap(Assembly.GetExecutingAssembly().FullName, $"{typeof(Worker).FullName}");

            var workerThread = new Thread(() => workerProxy.Execute(virtualHost, userName, password, useWorkaround));
            workerThread.Start();

            Thread.Sleep(TimeSpan.FromSeconds(2));

            try
            {
                Console.Out.WriteLine("\n\nUnloading child AppDomain - this may take a couple of seconds");

                AppDomain.Unload(childDomain);

                Console.Out.WriteLine("\n\nUnloaded child AppDomain without error\n");
            }
            catch (CannotUnloadAppDomainException ex)
            {
                Console.Error.WriteLine($"\n\nUnexpectedly received {nameof(CannotUnloadAppDomainException)}: {ex.Message}\n{ex.StackTrace}\n");
            }

            Console.Out.WriteLine("\nPress any key to finish.");
            Console.ReadLine();
        }
    }
}

ghost avatar May 08 '20 13:05 ghost

What would you recommend as the desired behavior? How can autorecovering connections detect a condition where they should stop performing recoveries?

michaelklishin avatar May 08 '20 13:05 michaelklishin

Well, in my opinion an object should not resist its parent AppDomain being unloaded - except for very specific reasons - which I don't see present here.

So I would recommend that AutorecoveringConnection instances should subscribe to the AppDomain.DomainUnload Event of their containing AppDomain and preferrably close and dispose their delegate Connection instance at the time it gets triggered. At the very least they should cease their recovering effort and leave the rest to the Thread cleanup mechanics executed by the AppDomain.

Does that make sense to you?

ghost avatar May 08 '20 14:05 ghost

Yes, it does. I just wanted this conversation to move to a specific suggestion. Other clients sometimes limit connection recovery attempts e.g. when they haven't succeeded for a certain number of times. Reacting to domain shutdown, if we can do it easily, makes perfect sense.

michaelklishin avatar May 08 '20 14:05 michaelklishin

As a note, AppDomain is not fully implemented on .NET core -

https://docs.microsoft.com/en-us/dotnet/core/porting/net-framework-tech-unavailable

lukebakken avatar May 08 '20 14:05 lukebakken

Nothing fancy needed at our end. We have not tried to be clever with configuring AutorecoveringConnection behavior so far. If only we could host RabbitMQ.Client in ASP.NET WebServices and be spared those annoying AppPool crashes we'd be perfectly happy.

Ah yes, and things could be entirely different on ASP.NET Core, so it would be very helpful if you could look into the corresponding situation on that platform, too.

ghost avatar May 08 '20 15:05 ghost

@bording if you have a second, I would be very interested in your opinion here. It seems odd for this library to interact with an AppDomain event directly (DomainUnload), but maybe that's an acceptable practice?

lukebakken avatar May 14 '20 15:05 lukebakken

Wouldn't it make more sense for the user to hook up closing the AutorecoveringConnection on AppDomain.Unload, since the event is never raised in the default AppDomain?

https://docs.microsoft.com/en-us/dotnet/api/system.appdomain.domainunload?view=netframework-4.8

stebet avatar May 14 '20 18:05 stebet

Doing some quick research, I do see some historical discussion on this topic, and I see the now-vestigial Connection.HandleDomainUnload method.

I agree that it's odd for this library to directly be interacting with AppDomain APIs, If someone was creating custom AppDomains themselves as the repro in the initial post demonstrates, then it would be reasonable to say that they need to also wire up the DomainUnload event.

However, you can often be inside a non-default AppDomain without even realizing it, so I'm not sure I'd want the answer here to be "it's up to the user to wire it up themselves."

If we take a step back, why do we need to worry about AppDomains unloading at all? If the unload shows up to the Connection class as a ThreadAbortException, why it is just swallowing it up in a way that makes AutoRecoveringConnection try and reconnect?

It seems like making a ThreadAbortException a case where the connection is shut down and no recovery is attempted would be enough to prevent the domain from unloading. it's hard to imagine that would have negative impact in any other scenario.

bording avatar May 14 '20 21:05 bording

Good point. I did not, by any means, intend to suggest that a solution should directly follow the approach of my repro snippet. What I would like to emphasize is that being

inside a non-default AppDomain without even realizing it

is the typical situation any IIS-hosted ASP.NET WebService will find itself in, so it could be quite common in the Windows ecosystem.

ghost avatar May 15 '20 06:05 ghost

This does not feel like a "must have" for 6.1.0. Punting for now.

lukebakken avatar May 19 '20 23:05 lukebakken