timesketch icon indicating copy to clipboard operation
timesketch copied to clipboard

Importer doesn't accept dates before 1677 or after 2262

Open dadokkio opened this issue 4 years ago • 6 comments

Describe the bug Not sure if this is a real bug but I want to report this in any case.

If the data you want to import has a date before 1677 [in our case some windows log that default to 1601-01-01T00:00:00Z] or after 2262 the importer with pandas support will fail with:

ERROR:timesketch_importer.importer:Unable to change datetime, is it badly formed?
Traceback (most recent call last):
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2085, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 350, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/envs/mte/lib/python3.9/site-packages/timesketch_import_client/importer.py", line 186, in _fix_data_frame
    date = pandas.to_datetime(data_frame['datetime'], utc=True)
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 801, in to_datetime
    cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 178, in _maybe_cache
    cache_dates = convert_listlike(unique_dates, format)
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 465, in _convert_listlike_datetimes
    result, tz_parsed = objects_to_datetime64ns(
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2090, in objects_to_datetime64ns
    raise e
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2075, in objects_to_datetime64ns
    result, tz_parsed = tslib.array_to_datetime(
  File "pandas/_libs/tslib.pyx", line 364, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 586, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 582, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 558, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslibs/np_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1601-01-01 00:00:05

It seems that valid range are:

>> pd.Timestamp.min
Timestamp('1677-09-22 00:12:43.145225')
>> pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')

Expected behavior Default the line to oldest supported value or skip the line with a warning

Used lib for import timesketch-api-client==20210205 timesketch-import-client==20210215

dadokkio avatar Feb 16 '21 11:02 dadokkio

I opt for skipping with a warning,

kiddinn avatar Feb 16 '21 12:02 kiddinn

also take into consideration #1534, since that will also change ingestion of data...

and since that will use the pandas library for ingestion, it may have similar effects when the web UI is used to import the data (since pandas is already used in the importer, which is the reason for this)

kiddinn avatar Feb 18 '21 21:02 kiddinn

so to be fair, it's not a bug per se, as in these are clearly invalid dates, it's just that we need TS ingestion to handle that in a more graceful way, so that the ingestion can still take place, and perhaps these bad dates either filtered out or time set to zero in order for the ingestion to be able to be completed.

Now #1534 has been merged in, this becomes even more important to fix.

kiddinn avatar Feb 26 '21 10:02 kiddinn

@kiddinn are you planning to work on this bug? Otherwise I opt to free it up and someone else look into it maybe.

jaegeral avatar Jul 01 '21 12:07 jaegeral