NServiceBus
NServiceBus copied to clipboard
Memory leak due to resources not being cleaned up if start partially fails
Problem
It can happen that start partially fails so that a receive component has been initialized and started which creates a transport specific IPushMessage instance. However, if the receiving parts fail to start these resources are not cleaned up thus result in a memory leak.
Why is this really a problem?
This is problematic in environments where Start cannot result in the host to be stopped (un)gracefully. If the host has logic that would just "retry" to start in a loop until that succeeds. A scenario is an IIS hosted solution where the broker is partially down due to maintenance. The application should still just run where the messaging component could not be connected if that host was just exactly started when the broker was down. Normally the connection to the broker would be recovered is the endpoin instance was already started succesfully.
If start fails we should teardown/dispose any resources that potentially got created at start.
Solution spike
The following spike tries to Stop most resources based on the assumption that most resources will not throw when invoking .Stop(..) and also has a try..catch.
- Spike: https://github.com/Particular/NServiceBus/pull/6566
Solution suggestion
However, this might not deal with all of this correctly. Likely the best way forward is that a resources (DI) container would be created where all closable/disposable resources are registered so that at a Start failure all this resources can be closed/disposed propertly.
Workaround?
A workaround does not really exist. The only option is redesigning the system where the host AppDomain ALWAYS gets closed. This usually means that the hosting process is terminated. This is only a valid (and better) choice for non-interactive / async messaging processes.