sdk icon indicating copy to clipboard operation
sdk copied to clipboard

Error Handling and dead letter queues for targets

Open MeltyBot opened this issue 4 years ago • 7 comments

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/134

Originally created by @vischous on 2021-05-26 17:40:34


Following up on our Office hours today. Not sure if we want this to be Target only or not your call @aaronsteers

Error Handling especially with SaaS style targets gets pretty interesting. Here's errors you'll hit at some point (one's that I can think about off the top of my head there's tons more, everything you can imagine when you run this stuff at scale)

Connection issues

  1. For HTTP requests: 500 Requests, timeouts in everyway you can imagine (hopefully your libraries have sane defaults for connection timeouts, read timeouts, targets will need to change these at timmes) "Server Busy", "Internal Error", etc
  2. Data Issues for HTTP you'll get response codes all over the place depending on the api but generally something like 406, 403, 404, 400, etc. "User already exists", "Name is invalid (over char limit)", "Unknown Error occured", "Cannot disable user due to them having xyz permissions"

Each of these errors needs to be handled slightly different. Some a simple retry with exponential backoff fixes your problem.

Data issues are something you can't get away from, and for a lot of SaaS apis (lots are not http based by the way, see Active Directory, and more) you'll get data errors that are masked as things like 500 errors.

Functionality that's probably needed:

  1. Error handling strategy for "hard" or "soft" errors. One record failing out of 1000 should still output something to stderr / stdout , and the target process should return a response code of something different than 0, but it's no where near as critical as all 1000 records failing which would need a response code of 1.
  2. Configuration for changing thresholds by users of targets. Everyone has different use cases. Thresholds could be percentage based, hard coded number of rows like >10 rows is a "hard" failure
  3. Retry logic

Some of this "maybe all?" could be handling by a dead letter queue of some sort.

Use cases that I know about today:

  • https://github.com/AutoIDM/tap-googleads/pull/19/files
  • https://github.com/MeltanoLabs/tap-github/issues/16
  • https://gitlab.com/meltano/sdk/-/issues/282

MeltyBot avatar May 26 '21 17:05 MeltyBot

What is the status on this feature? Seems like a pretty useful usecase.

louis-vines avatar Mar 24 '23 09:03 louis-vines

CC @visch @tayloramurphy @aaronsteers

WillDaSilva avatar Mar 24 '23 14:03 WillDaSilva

@louis-vines a first pass for us would like be this issue:

  • https://github.com/meltano/sdk/issues/1409

With better exit codes for SDK-based connectors we can start to handle each error better overall. Likely we need to break this issue up into specific proposals and make progress on those. cc @aaronsteers

tayloramurphy avatar Mar 25 '23 02:03 tayloramurphy

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

stale[bot] avatar Jul 23 '23 04:07 stale[bot]

Still relevant

tayloramurphy avatar Jul 24 '23 14:07 tayloramurphy

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

stale[bot] avatar Jul 23 '24 16:07 stale[bot]