eland icon indicating copy to clipboard operation
eland copied to clipboard

Ignore broken datetime strings on eleasticsearch

Open weidenka opened this issue 1 year ago • 2 comments

For me this fixes an error related to a wrong format (1-01-01 00:00:00 ) of a single timestamp on the ES side. I don't see a disadvantage excluding those data points in the conversion.

Stacktrace

Traceback (most recent call last):
  File "/mypath/env/lib/python3.9/site-packages/eland/common.py", line 135, in elasticsearch_date_to_pandas_date
    return pd.to_datetime(
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 1102, in to_datetime
    result = convert_listlike(np.array([arg]), format)[0]
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 393, in _convert_listlike_datetimes
    return _to_datetime_with_unit(arg, unit, name, tz, errors)
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 557, in _to_datetime_with_unit
    arr, tz_parsed = tslib.array_with_unit_to_datetime(arg, unit, errors=errors)
  File "pandas/_libs/tslib.pyx", line 364, in pandas._libs.tslib.array_with_unit_to_datetime
ValueError: non convertible value 0001-01-01T00:00:00+00:00 with the unit 'ms'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mypath/scripts/score_imputer.py", line 19, in <module>
    korro_data = query_data_from_elastic(use_cache=True)
  File "/mypath/daprod_health_data/korro_data.py", line 39, in query_data_from_elastic
    df = ed.eland_to_pandas(elastic_df)
  File "/mypath/env/lib/python3.9/site-packages/eland/etl.py", line 292, in eland_to_pandas
    return ed_df.to_pandas(show_progress=show_progress)
  File "/mypath/env/lib/python3.9/site-packages/eland/dataframe.py", line 1351, in to_pandas
    return self._query_compiler.to_pandas(show_progress=show_progress)
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 506, in to_pandas
    return self._operations.to_pandas(self, show_progress)
  File "/mypath/env/lib/python3.9/site-packages/eland/operations.py", line 1226, in to_pandas
    for df in self.search_yield_pandas_dataframes(query_compiler=query_compiler):
  File "/mypath/env/lib/python3.9/site-packages/eland/operations.py", line 1278, in search_yield_pandas_dataframes
    df = query_compiler._es_results_to_pandas(hits)
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 268, in _es_results_to_pandas
    rows.append(self._flatten_dict(row, field_mapping_cache))
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 348, in _flatten_dict
    flatten(y)
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 312, in flatten
    flatten(x[a], name + a + ".")
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 322, in flatten
    x = elasticsearch_date_to_pandas_date(
  File "/mypath/env/lib/python3.9/site-packages/eland/common.py", line 139, in elasticsearch_date_to_pandas_date
    return pd.to_datetime(value)
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 1102, in to_datetime
    result = convert_listlike(np.array([arg]), format)[0]
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 438, in _convert_listlike_datetimes
    result, tz_parsed = objects_to_datetime64ns(
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2177, in objects_to_datetime64ns
    result, tz_parsed = tslib.array_to_datetime(
  File "pandas/_libs/tslib.pyx", line 427, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 678, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 674, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 649, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslibs/np_datetime.pyx", line 212, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00 present at position 0

Process finished with exit code 1

weidenka avatar Nov 01 '23 08:11 weidenka

❌ Author of the following commits did not sign a Contributor Agreement: 521cf6f12041b77fb4b7051d710fc716e0a4070d

Please, read and sign the above mentioned agreement if you want to contribute to this project

You silently ignore errors in ES, yes. For me that sounds ok. Adding an error parameter would be better, I agree. Do you plan this for the near future?

weidenka avatar Nov 06 '23 08:11 weidenka