sentry-go Go routines with Sentry

I am trying to integrate sentry for go routines. Based on this blog post, https://blog.sentry.io/2019/08/15/introducing-sentrys-unified-go-sdk , it seems that each go routine will have to have their own sentry integration using Hub to capture events, panics etc. Is there an example where I can integrate sentry in a go routine to capture events, panics etc in other go routines? Basically, I dont want to force every go routine to have sentry integration and trying to see if there is a more elegant solution

Nov 18 '19 18:11 parasssh

it seems that each go routine will have to have their own sentry integration

Kinda, but it's only a Hub and Scope, which is basically an empty struct containing some metadata, it's very lightweight.

Basically, I dont want to force every go routine to have sentry integration and trying to see if there is a more elegant solution

You can still use Sentry inside goroutines as you'd do outside, however, you won't get guarantees about the scope data attached to the event (user data, extras, tags, breadcrumbs etc.). If you want to capture "just" errors and panics, you can use sentry.CaptureException directly, without cloning the hub.

Just a tad more than in the blog post: https://docs.sentry.io/platforms/go/goroutines/

Nov 19 '19 10:11 kamilogorek

Thanks for the reply. Your answer still requires me to find every go routine in my application and add the Sentry integration (with or without Hub) for each one of them. Basically, I need O(n) integration steps for "n" go routines.

I, however, am looking to see if there is a way to integrate Sentry once i.e O(1) to capture panics that may happen in any other go routines in the application. That way I dont have to worry about integration with every go routine as they come and go.

Jan 28 '20 21:01 parasssh

Hi @parasssh, thanks for using Sentry!

I'll give a shot at explaining, please bear with me.

By the way, if you may share, what kind of program are you working on (command-line, web server, etc)? We'd be happy to share more specific advice based on your use case.

Hubs, Scopes and Goroutines

requires me to find every go routine in my application and add the Sentry integration (with or without Hub) for each one of them. Basically, I need O(n) integration steps for "n" go routines.

Short answer:

The number of goroutines in a program is disconnected from the number of Hubs and Scopes. Some programs are okay with a single Hub/Scope, and in that case they use shortcut methods like sentry.Capture*.

Longer answer:

Sentry allows you to provide useful metadata along with any events you capture in a program. The metadata is attached to what we call a Scope.

During the lifetime of a program it can have one or more scopes, as many as designed.

Let's consider two examples:

First, consider a short-lived command-line application.

All events belong to a single logical scope.

In that case, the SDK makes it convenient to use shortcut methods like sentry.Capture* and obviate the need to deal with Hubs and Scopes. And that is regardless of how many goroutines your program has, no matter if you explicitly start goroutines or they are started behind the scenes.
Second, consider an HTTP server. In Go, servers built on top of the standard net/http package start a new goroutine to handle each client request it accepts.

In this type of program, every client request represents an independent logical Scope.

The Sentry Go SDK will automatically populate the http.Request.Context with a new Hub/Scope containing request information that enriches the events reported to Sentry through that Hub.

Note that if within an HTTP Handler the program creates new goroutines, they may all share the Hub/Scope.

Conclusion

The program needs a new Hub/Scope for each new logical part of the program, as it is intended to be reported to Sentry. More Scopes means more specialized metadata local to events. Metadata eventually help you understand the conditions under which the event/error happened, reproduce and fix errors.

Goroutines are often used to encapsulate computations that belong to a logical part of the program. They may or may not have their own Hub/Scope. Multiple Goroutines may cooperate and belong to a single logical part of a program.

Capturing panics

capture panics that may happen in any other go routines in the application

In short, there is no mechanism to "capture panics that may happen in any goroutine" in the Go language.

I like to reference people to https://github.com/golang/go/issues/20161, though there are other places where discussion has taken place before like in the official forum: https://groups.google.com/forum/#!topic/golang-nuts.

Note that panics in Go are not meant as a primary control flow mechanism like exceptions in other languages. Typically code calls panic as a last resort, with the intention is to crash the program.

The way around supervising programs that crash is to move out of process. We have a tracking issue for that, #107.

Please do let us know if something is still not clear. Thank you!

Jan 29 '20 16:01 rhcarvalho

Thanks for the explanation. The application is a distributed graph-db (dgraph) which consists of worker nodes and manager nodes.

My immediate requirement is to capture all panics in any go routine. From your comments above, it seems Go doesn't have any such thing. And the https://github.com/getsentry/sentry-go/issues/107 is exactly what I need. I guess in the meantime, I will have to resort to a wrapper program outlined here https://github.com/golang/go/issues/20161#issuecomment-561560657 and prohibit the "go" operator, yes?

Feb 04 '20 23:02 parasssh

My immediate requirement is to capture all panics in any go routine. From your comments above, it seems Go doesn't have any such thing.

Correct.

I guess in the meantime, I will have to resort to a wrapper program outlined here golang/go#20161 (comment) and prohibit the "go" operator, yes?

What I would suggest is, as per the philosophy in https://github.com/dgraph-io/dgraph/blob/master/CONTRIBUTING.md, start simple by using a single hub and scope for the entire application. That means you only need to call functions in the sentry namespace.

Integration with Sentry: Stage 1

In the entrypoint to dgraph, https://github.com/dgraph-io/dgraph/blob/master/dgraph/main.go, setup the Sentry SDK as demonstrated in https://github.com/getsentry/sentry-go/blob/master/example/basic/main.go.

At a minimum all you need is sentry.Init. Defer a call to sentry.Flush to flush events to Sentry before the program terminates.

Now, anywhere else in the code base you can have calls to sentry.CaptureException whenever you want to report an error to Sentry.

After this step you'll start to see reported errors in Sentry. :fireworks: :tada:

Stage 2

This is when you start adding more context to help you fix errors.

We support a few ways how you can enrich the reported events, and the SDK ships with builtin integrations aiming at providing information that helps debugging:

After this step you will probably have fixed problems identified in Stage 1, and will have improved the reporting of errors with contextual information. :100:

Stage 3

Panics. Let's tackle them. Panics are typically not used for control flow in Go.

Let's look specifically at how many places explicitly start new Goroutines:

dgraph ❯❯❯ rg --iglob='*.go' --iglob='!*_test.go' '^\s*go ' --no-heading | wc -l                                                                                                        master
113

And how many places call panic:

dgraph ❯❯❯ rg --iglob='*.go' --iglob='!*_test.go' 'panic\(' --no-heading | wc -l                                                                                                        master
31

A quick look at the panic sites suggests that the vast majority are in test/benchmark code, initialization, unreachable code. It wouldn't be an enormous effort to take the few spots that would panic and report an error to Sentry along side that.

Look at which goroutines are expected to panic, if any, and wrap them with sentry.Recover or sentry.RecoverWithContext. More at https://docs.sentry.io/platforms/go/panics/.

Focus on the main goroutine and whatever goroutines are spawned frequently to do complex work, e.g. background tasks.

Stage 4

Here we cover the likely small percentage of cases that cause the program to crash without a chance to report to Sentry so far. Stay in touch for news.

Let us know if you need anything, we're here to help.

Cheers!

Feb 05 '20 16:02 rhcarvalho

I did exactly that. And events are flowing in now. Will continue to expore.

For Stage 4, I see there are three ways.

For each "interesting" go routine, call Sentry.Recover() -OR-
https://github.com/golang/go/issues/20161#issuecomment-561560657 -OR-
https://github.com/getsentry/sentry-go/issues/107#issuecomment-584482787 that you pointed to in #107

I chose (3) above as it is a catch-all wrapper for any manual or runtime panics and least maintenance. All future go routines also benefit without doing anything special.

== Quick question:

I am still confused whether to use one Sentry Hub/Scope and use ConfigureScope / WithScope throughout the application or use Cloned Hubs each with its own Hub/Scope. Any best practice here?

Feb 11 '20 05:02 parasssh

@parasssh happy to read your positive report!

For what I looked into dgraph, (3) sounds like a good choice.

If I may comment, I think the suggestion in (2) doesn't work in the general case. Fundamentally, you cannot control goroutines in third-party dependencies and their dependencies -- unless you explicitly vet them (and re-vet when bumping versions). And even then you are not covering all ways a program can crash. You're getting a false sense of "safety" or "coverage".

There has been a discussion (and possibly more than one) in golang-nuts about running clean up code when the program is exiting -- this would be a "desirable place" to hook up Sentry. I'd like to quote answers from Russ Cox, Ian Lance Taylor and minux:

is there an atexit?

Russ:

Atexit may make sense in single-threaded, short-lived programs, but I am skeptical that it has a place in a long-running multi-threaded server. I've seen many C++ programs that hang on exit because they're running global destructors that don't really need to run, and those destructors are cleaning up and freeing memory that would be reclaimed by the operating system anyway, if only the program could get to the exit system call. Compared to all that pain, needing to call Flush when you're done with a buffer seems entirely reasonable and is necessary anyway for correct execution of long-running programs.

Even ignoring that problem, atexit introduces even more threads of control, and you have to answer questions like do all the other goroutines stop before the atexit handlers run? If not, how do they avoid interfering? If so, what if one holds a lock that the handler needs? And on and on.

I'm not at all inclined to add Atexit.

Ian:

The only fully reliable mechanism is a wrapper program that invokes the real program and does the cleanup when the real program completes. That is true in any language, not just Go.

In my somewhat unformed opinion, os.AtExit is not a great idea. It is an unstructured facility that causes stuff to happen at program exit time in an unpredictable order. It leads to weird scenarios like programs that take a long time just to exit, an operation that should be very fast. It also leads to weird functions like the C function _exit, which more or less means exit-but-don't-run-atexit-functions.

That said, I think a special exit function corresponding to the init function is an interesting idea. It would have the structure that os.AtExit lacks (namely, exit functions are run in reverse order of when init functions are run).

But exit functions won't help you if your program gets killed by the kernel, or crashes because you call some C code that gets a segmentation violation.

minux:

Personally, I prefer the style where program exit is handled exactly same as program crash. I believe no matter how hard you try, your program can still crash under some unforeseen situations; for example, memory shortage can bring any well-behave Go program to a crash, and there is nothing you can do about it; so it's better to design for them. If you follow this, you won't feel the need for atexit to clean up (because when your program crash, atexit won't work, so you simply can't depend on it).

Feb 11 '20 09:02 rhcarvalho

@parasssh one thing to keep in mind and test is that introducing an extra process may affect how other software interact with dgraph. In particular, make sure that signal handling works satisfactorily for the supported operating systems. See https://github.com/mitchellh/panicwrap/issues.

More details in https://groups.google.com/forum/#!msg/golang-nuts/XLXXKkRqO1c/_SDPCIHvhaoJ, quoted below.

A global panic handler for crash reporting

I’d like to do crash-reporting for programs that run in environments I don’t control (e.g. your laptop). The behavior I want is similar to what many production-grade desktop applications do when they crash: capture process state information, optionally prompt the user for permission, and send the crash report to a secure server.

How one would implement such a function for Go programs is tricky without cooperation from the runtime. The options I’m considering:

[...] 4. Fork immediately after startup and use the parent process to monitor the child for exit code 2 and a panic traceback on stderr. This is the approach taken by panicwrap[0] which is known to work, but has two issues. Dealing with signals becomes especially tricky. Any number of supervisor programs and system administration tools rely on sending signals to manipulate processes in production. The crash-handling parent process would need to handle these signals appropriately. Should it forward them to the children? Or rely on the signaling process to signal the whole process tree? Signal handling behavior is not consistent across platforms, which makes this difficult to get right. For example, Windows apparently sends CTRL+BREAK to the whole tree, but not CTRL+C. As a final point, this approach also fails on systems that disallow spawning additional processes (NaCl, maybe AppEngine, I’m unsure).

(emphasis mine)

Feb 11 '20 10:02 rhcarvalho

Quick question:

I am still confused whether to use one Sentry Hub/Scope and use ConfigureScope / WithScope throughout the application or use Cloned Hubs each with its own Hub/Scope. Any best practice here?

tl;dr Use Hub.Clone() when changing scope from concurrent goroutines.

Calling ConfigureScope / WithScope in the global hub (sentry.ConfigureScope / sentry.WithScope) changes global state. ConfigureScope is for permanent changes, WithScope for temporary changes. Docs: https://docs.sentry.io/enriching-error-data/scopes/?platform=go.

Since those mutate global state, if the program captures events concurrently from different goroutines, you probably don't want to use sentry.ConfigureScope / sentry.WithScope as it would affect any event sent on the global hub.

For those cases, clone the global hub (or any other hub) before mutating the scope (with hub.ConfigureScope / hub.WithScope), such that those changes will apply only to events captured from that hub (use hub.Capture* methods).

Feb 11 '20 10:02 rhcarvalho

@parasssh one thing to keep in mind and test is that introducing an extra process may affect how other software interact with dgraph. In particular, make sure that signal handling works satisfactorily for the supported operating systems. See https://github.com/mitchellh/panicwrap/issues.

More details in https://groups.google.com/forum/#!msg/golang-nuts/XLXXKkRqO1c/_SDPCIHvhaoJ, quoted below.

Thanks for the tip. Yes, signal handling was trouble-some. I fixed it in my application to handle it correctly

Mar 23 '20 17:03 parasssh

The number of goroutines in a program is disconnected from the number of Hubs and Scopes. Some programs are okay with a single Hub/Scope, and in that case they use shortcut methods like sentry.Capture*.

Hi, I'm using v0.5.1 (and v0.6.0). Capture* code is definitely not concurrent safe:

func (hub *Hub) CaptureException(exception error) *EventID {
	client, scope := hub.Client(), hub.Scope()
	if client == nil || scope == nil {
		return nil
	}
	eventID := client.CaptureException(exception, &EventHint{OriginalException: exception}, scope)
	if eventID != nil {
		hub.lastEventID = *eventID // <--- hub state modification
	} else {
		hub.lastEventID = "" // <--- hub state modification
	}
	return eventID
}

Seems that go-sentry API should be revisited. That API can't be safely used with gorotines, e.g. if sentry is a component of platform library used by other apps

Apr 21 '20 01:04 gruzovator

It would be greate to have concurrent safe call like sentry.ReportError(err error, metadata SomeMetadataType).

That kind of API is clean (no overhead of concepts like scopes, contexts, etc)

Apr 21 '20 01:04 gruzovator

Dear developers,

I have to pitch in a discussion and support @gruzovator's comment.

To be honest, I was caught off guard when I found out from my colleague that using sentry is not concurrently safe.

In our application we use echo http and grpc servers. In echo sentry library concurrency was safe (because of wrap-ups). Unfortunately, I can't say that about grpc server. I had to fix it with hub cloning. That technique seems to load more unnecessary overhead on the application than needed. The code looks even more hideous (sorry for harsh words :).

It seems to me that during our times (that almost all apps use goroutines in some way straightforward or not) it is important to make such library concurrently safe out of the box. Users shouldn't worry about that and invent work arounds to solve the problem.

In addition, out benchmarks show 35% of performance improvement if using just underlying client to report errors, not cloning hubs.

Could you please revise such an important part of your API?

We would be very grateful for your feedback!

Apr 22 '20 07:04 mariaefi29

I completely agree with @mariaefi29. The Hub ~and Scope~ abstractions do not fit in well with the go concurrency model. Especially when you try to integrate sentry in an application that does not fit the '1 goroutine per connection' model of HTTP servers.

My suggestion would be to introduce a new 'ScopedClient' type. A common EventCapturer interface that would plaster over clients, hubs and 'scoped clients'. This 'ScopedClient' could have a WithScope method that would produce an independent instance to be modified and passed as a dependency down the stack. Keeping track of the scope stack in Hubs is problematic for concurrency.

Apr 17 '21 07:04 alxarch

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

Dec 07 '22 09:12 github-actions[bot]

sentry-go sentry-go copied to clipboard

Go routines with Sentry

Hubs, Scopes and Goroutines

Conclusion

Capturing panics

Integration with Sentry: Stage 1

Stage 2

Stage 3

Stage 4

A global panic handler for crash reporting

sentry-go
sentry-go copied to clipboard