dlt
dlt copied to clipboard
show full HTTP response on failure (Feat/1605)
Description
This PR implements error logging of the full HTTP response on error when the dlt.sources.helpers.requests.Session is used.
Unless a custom session with different configuration is used, all HTTP responses with an error code 4xx will log an error.
This includes:
- the
RESTClient - each call made by the
dlt.sources.rest_api
Related Issues
- Resolves #1605
Deploy Preview for dlt-hub-docs ready!
| Name | Link |
|---|---|
| Latest commit | 9e066d80dd8da245206cada535c14599c4e7bfb7 |
| Latest deploy log | https://app.netlify.com/sites/dlt-hub-docs/deploys/6734c5996d93be0008e82824 |
| Deploy Preview | https://deploy-preview-1895--dlt-hub-docs.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
I propose an alternative solution to logging the full HTTP response on error: we can use a context manager catch_http_error that captures HTTP errors and allows for flexible and controlled error handling / debugging.
Example:
import dlt
from dlt.sources.helpers.rest_client import paginate, catch_http_error
@dlt.resource(
table_name="issues",
write_disposition="replace",
primary_key="id",
)
def get_issues():
for page in paginate(
"https://api.github.com/repos/dlt-hub/dlt/issues",
headers={"Authorization": "Bearer invalid_token_here"},
):
yield page
pipeline = dlt.pipeline(
pipeline_name="github_issues_merge",
destination="duckdb",
)
with catch_http_error() as http_error:
load_info = pipeline.run(get_issues)
if http_error:
print(f"HTTP {http_error.response.status_code}: {http_error.response.content}")
Pros:
- improved flexibility: the use can interact with captured HTTPError
- cleaner and safer logs
- selective usage: no need to alter the session
The rationale is that in my opinion accessing the HTTPError should be done on the higher level of abstraction than Requests's session as when the user is using the request/response directly they already have easy access to HTTP errors. So the idea of this change is to provide some way to extract HTTPError from the dlt exceptions.
+1 this is needed for vibe coding
I think what is in the PR is pretty good. We should just generalize the catcher to be able to find any exception in the tree and move it to utils. but my take here is that we do not solve typical user problems which is a better readability of exceptions coming from Pipeline. my take:
- Pipeline class always raises
PipelineStepFailedexception with a trace of a failed step. We can pretty print this. - We could use
get_exception_trace_chainorget_exception_tracesto get traces of the inner exceptions and print them. Those traces contain additional info like load ids, job ids, pipeline names etc. that we could pretty print - with log level on DEBUG or INFO we could print verbose information ie. for HTTPExceptions? WDYT? @adrianbr @burnash if you agree I can push someone to implement it
Hey @willi-mueller, thanks so much for this PR! I'm going to close it for now: initially I suggested and took a different approach here, but after consideration, I've decided to implement something similar to your suggestion in a different PR (#2867). Sorry about the confusion, and thank you again for your contribution!