timesketch
timesketch copied to clipboard
Importer doesn't accept dates before 1677 or after 2262
Describe the bug Not sure if this is a real bug but I want to report this in any case.
If the data you want to import has a date before 1677 [in our case some windows log that default to 1601-01-01T00:00:00Z] or after 2262 the importer with pandas support will fail with:
ERROR:timesketch_importer.importer:Unable to change datetime, is it badly formed?
Traceback (most recent call last):
File "/envs/mte/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2085, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data)
File "pandas/_libs/tslibs/conversion.pyx", line 350, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/envs/mte/lib/python3.9/site-packages/timesketch_import_client/importer.py", line 186, in _fix_data_frame
date = pandas.to_datetime(data_frame['datetime'], utc=True)
File "/envs/mte/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 801, in to_datetime
cache_array = _maybe_cache(arg, format, cache, convert_listlike)
File "/envs/mte/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 178, in _maybe_cache
cache_dates = convert_listlike(unique_dates, format)
File "/envs/mte/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 465, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64ns(
File "/envs/mte/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2090, in objects_to_datetime64ns
raise e
File "/envs/mte/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2075, in objects_to_datetime64ns
result, tz_parsed = tslib.array_to_datetime(
File "pandas/_libs/tslib.pyx", line 364, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 586, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 582, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 558, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslibs/np_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1601-01-01 00:00:05
It seems that valid range are:
>> pd.Timestamp.min
Timestamp('1677-09-22 00:12:43.145225')
>> pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')
Expected behavior Default the line to oldest supported value or skip the line with a warning
Used lib for import timesketch-api-client==20210205 timesketch-import-client==20210215
I opt for skipping with a warning,
also take into consideration #1534, since that will also change ingestion of data...
and since that will use the pandas library for ingestion, it may have similar effects when the web UI is used to import the data (since pandas is already used in the importer, which is the reason for this)
so to be fair, it's not a bug
per se, as in these are clearly invalid dates, it's just that we need TS ingestion to handle that in a more graceful way, so that the ingestion can still take place, and perhaps these bad dates either filtered out or time set to zero in order for the ingestion to be able to be completed.
Now #1534 has been merged in, this becomes even more important to fix.
@kiddinn are you planning to work on this bug? Otherwise I opt to free it up and someone else look into it maybe.