NServiceBus icon indicating copy to clipboard operation
NServiceBus copied to clipboard

Endpoint startup is dependent on transport availability

Open kbaley opened this issue 3 years ago • 2 comments
trafficstars

Users sometimes run into issues when the transport infrastructure isn't reachable during endpoint startup time. Unlike once the endpoint has started and connection losses can be handled and re-established by the transport, the startup of the endpoint fails, typically causing a full exception on the application which the user has to deal with.

Sometimes this isn't a big issue, especially when using hosting environments like Windows Services which can be easily configured to restart the service and deal with infrastructure problems. However, this seems to become a less common hosting option and users often end up wrapping their host logic into try/catch and retry logic to ensure that the application can be easily deployed.

The main reason the transport is required at the beginning seems to be to ensure that we can directly throw an exception if something is misconfigured (e.g. wrong connection string, missing permissions, etc.) and surface that more visibly to the user rather than having them to figure out that a running process isn't really working because of misconfiguration.

A simple solution might be to retry transport/persistence startup internally as part of Endpoint.Start and only give up and throw after multiple attempts. This would make it more resilient to short outtakes and service startup order issues while keeping immediate misconfiguration feedback.

kbaley avatar Oct 10 '22 14:10 kbaley

If someone is looking to pick this up as a regular platform enhancement, it's possible it may take longer than we typically allot for those. Either way, some initial analysis is still helpful even if it can't be completed initially. And if it can be done in a platform enhancement, quick reminder that it's okay if this is the only thing included in the release

kbaley avatar Oct 10 '22 15:10 kbaley

I remember discussing this with @danielmarbach some time ago in the context of some work we were doing with the SqlTransport. One issue connected to allowing the application to bootstrap even if the transport is not available is that it creates an unusable message session / endpoint instance.

For example, a web application might be started, web requests handled but no controller action can succeed if it needs to use the messaging infrastructure. It's true that te same can randomly happen during the web application lifetime, however that's supposed to be a transient failure. On the other hand starting with no infrastructure available, at the time of the discussion, sounded off.

mauroservienti avatar Oct 12 '22 08:10 mauroservienti