dlt
dlt copied to clipboard
PyArrow backend fails when MySQL column contains invalid datetime values
dlt version
1.17.1
Describe the problem
When loading data from a MySQL table using pyarrow backend, the pipelines fails if the table contains invalid datetime values such as 0001-01-01 00:00:00.
With SQLAlchemy backend, these values are automatically converted into NULL, and the extraction, normalization works as expected.
However, with PyArrow, the process fails with a conversion error.
Steps to Reporduce:
t = sql_table(
credentials="mysql+pymysql://localhost:3306/db",
table="invalid_date_table",
backend="pyarrow",
incremental=dlt.sources.incremental(cursor_path="updated_at",initial_value=None, end_value=None, primary_key="id")
)
pipeline.run(t, write_disposition="merge")
Error Trace:
dlt.extract.exception.ResourceExtractionError: In processing pipe 'invalid_date_table' : extraction of resource 'invalid_date_table' in 'generator' 'table_rows' caused an exception: Conversion to arrow failed for field 'item_datetime' with dlt hint 'data_type=None' and 'inferred_arrow_type=None' dlt failed to encode values to an Arrow-compatible type.
Expected behavior
invalid datetime values should be handled gracefully - Ideally converted into NULL, just as the SQLAlchemy backend dose.
Steps to reproduce
Create and run pipeline with mysql table contains invalid dates.
Operating system
macOS
Runtime environment
Local
Python version
3.13
dlt data source
MySQL
dlt destination
No response
Other deployment details
No response
Additional information
No response