dlt icon indicating copy to clipboard operation
dlt copied to clipboard

PyArrow backend fails when MySQL column contains invalid datetime values

Open yeldeeb opened this issue 3 months ago • 0 comments

dlt version

1.17.1

Describe the problem

When loading data from a MySQL table using pyarrow backend, the pipelines fails if the table contains invalid datetime values such as 0001-01-01 00:00:00.

With SQLAlchemy backend, these values are automatically converted into NULL, and the extraction, normalization works as expected.

However, with PyArrow, the process fails with a conversion error.

Steps to Reporduce:

t = sql_table(
credentials="mysql+pymysql://localhost:3306/db",
table="invalid_date_table",
backend="pyarrow",
incremental=dlt.sources.incremental(cursor_path="updated_at",initial_value=None, end_value=None, primary_key="id")
)

pipeline.run(t, write_disposition="merge")

Error Trace:

dlt.extract.exception.ResourceExtractionError: In processing pipe 'invalid_date_table' : extraction of resource 'invalid_date_table' in 'generator' 'table_rows' caused an exception: Conversion to arrow failed for field 'item_datetime' with dlt hint 'data_type=None' and 'inferred_arrow_type=None' dlt failed to encode values to an Arrow-compatible type.

Expected behavior

invalid datetime values should be handled gracefully - Ideally converted into NULL, just as the SQLAlchemy backend dose.

Steps to reproduce

Create and run pipeline with mysql table contains invalid dates.

Operating system

macOS

Runtime environment

Local

Python version

3.13

dlt data source

MySQL

dlt destination

No response

Other deployment details

No response

Additional information

No response

yeldeeb avatar Oct 08 '25 00:10 yeldeeb