dlt icon indicating copy to clipboard operation
dlt copied to clipboard

Pipeline crashes when single record has None value at incremental cursor path

Open willi-mueller opened this issue 7 months ago • 9 comments

dlt version

0.5.1

Describe the problem

The following data crashes when trying to load it incrementally with the cursor_path="created_at"

    data = [
        {"id": 1, "created_at": 1},
        {"id": 2, "created_at": None},
        {"id": 3, "created_at": 2},
    ]

I could not isolate it yet, but in version 0.4.12 the second row where the created_at is None was under some unknown conditions skipped over and not loaded into the destination.

Expected behavior

  • [x] Allow the user to specify whether the incremental transformer raises an error or accepts the row:
inc_0 = dlt.sources.incremental(cursor_path="updated_at", on_cursor_value_none="include")
inc_1 = dlt.sources.incremental(cursor_path="updated_at", on_cursor_value_none="raise")
  • [x] Documentation how to set a default cursor_path in case the value is None.
  • [x] Clarify with the exception whether the cursor path is missing or the value at the cursor path is None.

Steps to reproduce

See test suite in https://github.com/dlt-hub/dlt/pull/1576

Operating system

macOS

Runtime environment

Local

Python version

3.11

willi-mueller avatar Jul 09 '24 17:07 willi-mueller