go-agent
go-agent copied to clipboard
Provide context-propogated methods for WaitForConnection and Shutdown
Summary
Provide the following methods on Application:
ShutdownContext(ctx context.Context) error
WaitForConnectionContext(ctx context.Context) error
Desired Behaviour
Instead of passing a timeout directly, use the cancellation channel on the context to abort shutdown/waiting for conncetions.
Additional context
We handle all timeouts with contexts per Go best practice. Many of the contexts don't have deadlines but are cancellable (and get cancelled by the caller). Right now there's no way for us to pass a timeout signal in those cases. I even took a look at the internals and notices that it uses channels internally, so should be pretty easy to add.
Just to be sure, can we get an understanding of your use case for these? This may be valuable insight because generally speaking, Shutdown
is not necessary to call at all, and most applications should not call WaitForConnection
. Unless there's a specific need to manage that yourself for a very short-lived application, it's best to let the Go Agent manage the connection behind the scenes on its own.
Our services operate in a restricted network environment. It's fairly common to forget to declare the correct outbounds/use the correct proxy for traffic that egresses the network. All the other components of our services provide similiar Start(ctx) error
methods that the service uses to determine whether or not the component came up healthy.
These responses are used to determine if we believe that we will inevitably recover or we need to immediately panic and crash.
Since New Relic starts on NewApplication()
, that slightly violates our lifecycle spec, but we can use WaitForConnection with a context-propogated timeout to achieve identical functionality (and know if our service didn't start up correctly).
I'm astonished to hear Shutdown is not necessary as the docs say:
Shutdown flushes data to New Relic's servers
I consider proper finalization and flushing to be an hard requirement for any operational monitoring system–otherwise you have no insight into a failure scenario–especially if your paradigm is crash-only failures.
To add some extra context, here's some Go psuedo-ish code that illustrations the lifecycle of our services–we have a set of interfaces that are used by the lifecycle manager to gracefully start and shutdown the service, as well as handling final flushes for anything that needs that behavior.
func main() {
shutdown := make(chan struct{}, 1)
lifecycle := lifecycle.New(
Discovery(),
NewRelicWrapper(newrelic.Application),
LogPusher(),
CommandAndControl(shutdown),
ServiceAPI(),
)
lifecycle.Start(context.TODO())
defer lifecycle.Finalize(context.TODO())
<-shutdown
lifecycle.Stop(context.TODO())
}
Hi @v3n , thanks for giving us all that context! I'll make sure our Product Manager sees this so they can prioritize this in our product roadmap.