dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[CT-1268] [Spike] Integrate dbt exceptions into structured logging system

Open gshank opened this issue 3 years ago • 6 comments

Right now we have structured logging events for the majority of our logging, except for exceptions. In some but not all cases we have a separate logging event. Ideally all exceptions should be logged automatically. This ticket is labeled a "spike" because in order to scope this work we need to identify a mechanism/policies for doing this.

gshank avatar Sep 28 '22 14:09 gshank

Copying from Nate's comment in https://github.com/dbt-labs/dbt-core/issues/4836#issuecomment-1104038063, because the underlying thrust of it continues to feel important:

whenever possible, we want to be able to programmatically distinguish when the root cause of an error is a warehouse issue, an adapter issue, or a core issue. This will be a big improvement in the process for debugging failed runs.

Basically, when a dbt run fails, how can we tell whether it was user error, database error, or internal dbt error?

In a Pythonic world, we'd expect to do this via class inheritance and type introspection. Given that our log events are now proto-composed, we probably need a field on each exception-log. Something like exception_cause or exception_category. Categories I can imagine:

  • parsing / validation error (problem in user code)
  • disallowed behavior, e.g. using deprecated functionality or a version of dbt-core that's disallowed by require-dbt-version
  • database error due to connection timeout
  • database error due to malformed query
  • internal / unhandled error
  • ... more? ...

jtcohen6 avatar Nov 15 '22 18:11 jtcohen6

This can probably be accomplished with an enum that defines all the categories. Without verifying the syntax I think something like this would work:

enum ExceptionCause {
{
        PARSING = 0;
        DISALLOWED = 1;
        DATABASE_CONNECTION = 2;
        DATABASE_QUERY = 3;
        UNHANDLED = 4;
        ...
 }

And then within the message add

message SomeException {
   EventInfo info = 1;
   repeated ExceptionCause exception_cause = 2;
   ...
}

This also allows the flexibility to add more types to the end of the enum which I'm sure we'll end up needing.

emmyoop avatar Nov 18 '22 17:11 emmyoop

Blocking question we need to answer: Should we have separate LogCategory versus ExceptionCategory, or combine the two into a single field?

(Idea with LogCategory is to formally serve the role that the event codes' starting letters have been informally playing: this is parsing-related, deps-related, adapter-related, ...)

jtcohen6 avatar Nov 23 '22 16:11 jtcohen6

One of the unknowns here is exactly what would be useful to consumers to get logged. We likely don't want to log all exceptions, just specific ones. Is it best to fire an event as part of those exceptions or where the exception is raised?

Another option would be to add more structured information to MainEncounteredError. This may cover a lot of use cases and then just give a single event that would need to be subscribed to for exception details. The exception could contain the error category and then the MainEncounteredError event could pass the category (connection error, etc) in the proto message.

Minimum: Easily differentiate between errors in the users code or errors related to dbt itself

Better: Categorize Errors

  • dbt code error (sql syntax or jinja, md, yml, etc.)
  • dbt project config error (something in project.yml not being done properly)
  • permissions error (to warehouse or git provider)
  • and more

emmyoop avatar Jan 18 '23 20:01 emmyoop

Another option would be to add more structured information to MainEncounteredError. This may cover a lot of use cases and then just give a single event that would need to be subscribed to for exception details. The exception could contain the error category and then the MainEncounteredError event could pass the category (connection error, etc) in the proto message.

After chatting about this with @isaac-taylor, this might get us a lot of what we need, at least in the medium-term. It would be a quality-of-life improvement over parsing the string exc from MainEncounteredError.

Even better if that catch all error had some sort of error code that we can use to categorize the error to either our system, or the user's code (like connection problems)

jtcohen6 avatar Mar 28 '23 17:03 jtcohen6

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Feb 22 '24 01:02 github-actions[bot]