goworker icon indicating copy to clipboard operation
goworker copied to clipboard

When processes fail, provide a stacktrace to the Resque failure queue

Open Jberlinsky opened this issue 6 years ago • 2 comments

Currently, when goworker workers fail due to an uncaught error, tracking down the root cause of the problem is rather difficult. Only the contents of the error message itself is passed to the Resque failure queue, unlike other languages (e.g. Ruby), which pass the entire stack trace.

Golang's errors do not support stack traces out of the box. The go-errors package makes it possible to attach stack trace information to the errors either at creation-time, or at wrapping-time.

This pull request takes advantage of the idempotency of go-errors's Wrap() function, allowing a worker to either raise a go-errors error object themselves, or automatically wrapping a standard Golang error when the failure is observed by goworker. A worker that throws a standard error will see the same information currently provided to the failure queue, with a stack trace pointing them to goworker/worker.go as the investigation point. A worker that wraps their standard error with go-errors will see that same information, plus a full stack trace to the point that the user instantiated/wrapped the go-errors error object.

Jberlinsky avatar Mar 30 '18 20:03 Jberlinsky

This bothers me as well about goworker :clap:.

Have you considered errors library? I assumed it was the de facto standard for this problem and GitHub stars seem to agree.

mingan avatar Apr 14 '18 12:04 mingan

Thanks for the contribution - I think this is a great idea.

My concerns for this would be:

  • If possible I would prefer to use the more widely used library
  • Stack traces can be pretty large. It looks like go-errors/errors has a default max of 50, but even so it could add 5-10kb per error the the Redis queue which could be problematic with high error rates. Can we provide something like the backtrace config for Sidekiq.
  • How does the schema for the failure object change when it's stored in Redis? Does resque-web show the error and backtrace properly? Ideally we'd use the same format as Resque or Sidekiq implementations. Users may rely on the specific type/format for the Error field.

benmanns avatar Apr 14 '18 18:04 benmanns