dlt
dlt copied to clipboard
Pipeline crashes when single record has None value at incremental cursor path
dlt version
0.5.1
Describe the problem
The following data crashes when trying to load it incrementally with the cursor_path="created_at"
data = [
{"id": 1, "created_at": 1},
{"id": 2, "created_at": None},
{"id": 3, "created_at": 2},
]
I could not isolate it yet, but in version 0.4.12 the second row where the created_at is None
was under some unknown conditions skipped over and not loaded into the destination.
Expected behavior
- [x] Allow the user to specify whether the incremental transformer raises an error or accepts the row:
inc_0 = dlt.sources.incremental(cursor_path="updated_at", on_cursor_value_none="include")
inc_1 = dlt.sources.incremental(cursor_path="updated_at", on_cursor_value_none="raise")
- [x] Documentation how to set a default cursor_path in case the value is
None
. - [x] Clarify with the exception whether the cursor path is missing or the value at the cursor path is
None
.
Steps to reproduce
See test suite in https://github.com/dlt-hub/dlt/pull/1576
Operating system
macOS
Runtime environment
Local
Python version
3.11