eland
eland copied to clipboard
Ignore broken datetime strings on eleasticsearch
For me this fixes an error related to a wrong format (1-01-01 00:00:00 ) of a single timestamp on the ES side. I don't see a disadvantage excluding those data points in the conversion.
Stacktrace
Traceback (most recent call last):
File "/mypath/env/lib/python3.9/site-packages/eland/common.py", line 135, in elasticsearch_date_to_pandas_date
return pd.to_datetime(
File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 1102, in to_datetime
result = convert_listlike(np.array([arg]), format)[0]
File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 393, in _convert_listlike_datetimes
return _to_datetime_with_unit(arg, unit, name, tz, errors)
File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 557, in _to_datetime_with_unit
arr, tz_parsed = tslib.array_with_unit_to_datetime(arg, unit, errors=errors)
File "pandas/_libs/tslib.pyx", line 364, in pandas._libs.tslib.array_with_unit_to_datetime
ValueError: non convertible value 0001-01-01T00:00:00+00:00 with the unit 'ms'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mypath/scripts/score_imputer.py", line 19, in <module>
korro_data = query_data_from_elastic(use_cache=True)
File "/mypath/daprod_health_data/korro_data.py", line 39, in query_data_from_elastic
df = ed.eland_to_pandas(elastic_df)
File "/mypath/env/lib/python3.9/site-packages/eland/etl.py", line 292, in eland_to_pandas
return ed_df.to_pandas(show_progress=show_progress)
File "/mypath/env/lib/python3.9/site-packages/eland/dataframe.py", line 1351, in to_pandas
return self._query_compiler.to_pandas(show_progress=show_progress)
File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 506, in to_pandas
return self._operations.to_pandas(self, show_progress)
File "/mypath/env/lib/python3.9/site-packages/eland/operations.py", line 1226, in to_pandas
for df in self.search_yield_pandas_dataframes(query_compiler=query_compiler):
File "/mypath/env/lib/python3.9/site-packages/eland/operations.py", line 1278, in search_yield_pandas_dataframes
df = query_compiler._es_results_to_pandas(hits)
File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 268, in _es_results_to_pandas
rows.append(self._flatten_dict(row, field_mapping_cache))
File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 348, in _flatten_dict
flatten(y)
File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 312, in flatten
flatten(x[a], name + a + ".")
File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 322, in flatten
x = elasticsearch_date_to_pandas_date(
File "/mypath/env/lib/python3.9/site-packages/eland/common.py", line 139, in elasticsearch_date_to_pandas_date
return pd.to_datetime(value)
File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 1102, in to_datetime
result = convert_listlike(np.array([arg]), format)[0]
File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 438, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64ns(
File "/mypath/env/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2177, in objects_to_datetime64ns
result, tz_parsed = tslib.array_to_datetime(
File "pandas/_libs/tslib.pyx", line 427, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 678, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 674, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 649, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslibs/np_datetime.pyx", line 212, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00 present at position 0
Process finished with exit code 1
❌ Author of the following commits did not sign a Contributor Agreement: 521cf6f12041b77fb4b7051d710fc716e0a4070d
Please, read and sign the above mentioned agreement if you want to contribute to this project
You silently ignore errors in ES, yes. For me that sounds ok. Adding an error parameter would be better, I agree. Do you plan this for the near future?