dlt icon indicating copy to clipboard operation
dlt copied to clipboard

show full HTTP response on failure (Feat/1605)

Open willi-mueller opened this issue 1 year ago • 4 comments

Description

This PR implements error logging of the full HTTP response on error when the dlt.sources.helpers.requests.Session is used.

Unless a custom session with different configuration is used, all HTTP responses with an error code 4xx will log an error.

This includes:

  1. the RESTClient
  2. each call made by the dlt.sources.rest_api

Related Issues

  • Resolves #1605

willi-mueller avatar Sep 28 '24 18:09 willi-mueller

Deploy Preview for dlt-hub-docs ready!

Name Link
Latest commit 9e066d80dd8da245206cada535c14599c4e7bfb7
Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6734c5996d93be0008e82824
Deploy Preview https://deploy-preview-1895--dlt-hub-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Sep 28 '24 18:09 netlify[bot]

I propose an alternative solution to logging the full HTTP response on error: we can use a context manager catch_http_error that captures HTTP errors and allows for flexible and controlled error handling / debugging.

Example:

import dlt
from dlt.sources.helpers.rest_client import paginate, catch_http_error


@dlt.resource(
    table_name="issues",
    write_disposition="replace",
    primary_key="id",
)
def get_issues():
    for page in paginate(
        "https://api.github.com/repos/dlt-hub/dlt/issues",
        headers={"Authorization": "Bearer invalid_token_here"},
    ):
        yield page


pipeline = dlt.pipeline(
    pipeline_name="github_issues_merge",
    destination="duckdb",
)

with catch_http_error() as http_error:
    load_info = pipeline.run(get_issues)

if http_error:
    print(f"HTTP {http_error.response.status_code}: {http_error.response.content}")

Pros:

  • improved flexibility: the use can interact with captured HTTPError
  • cleaner and safer logs
  • selective usage: no need to alter the session

The rationale is that in my opinion accessing the HTTPError should be done on the higher level of abstraction than Requests's session as when the user is using the request/response directly they already have easy access to HTTP errors. So the idea of this change is to provide some way to extract HTTPError from the dlt exceptions.

burnash avatar Nov 11 '24 09:11 burnash

+1 this is needed for vibe coding

adrianbr avatar Apr 08 '25 15:04 adrianbr

I think what is in the PR is pretty good. We should just generalize the catcher to be able to find any exception in the tree and move it to utils. but my take here is that we do not solve typical user problems which is a better readability of exceptions coming from Pipeline. my take:

  1. Pipeline class always raises PipelineStepFailed exception with a trace of a failed step. We can pretty print this.
  2. We could use get_exception_trace_chain or get_exception_traces to get traces of the inner exceptions and print them. Those traces contain additional info like load ids, job ids, pipeline names etc. that we could pretty print
  3. with log level on DEBUG or INFO we could print verbose information ie. for HTTPExceptions? WDYT? @adrianbr @burnash if you agree I can push someone to implement it

rudolfix avatar Apr 08 '25 16:04 rudolfix

Hey @willi-mueller, thanks so much for this PR! I'm going to close it for now: initially I suggested and took a different approach here, but after consideration, I've decided to implement something similar to your suggestion in a different PR (#2867). Sorry about the confusion, and thank you again for your contribution!

burnash avatar Jul 15 '25 09:07 burnash