spacepy icon indicating copy to clipboard operation
spacepy copied to clipboard

Add fast backend for ISO to ticktock input

Open balarsen opened this issue 5 years ago • 6 comments

Feature request to look at after #390. I would love to see a fast backend (C, numba, etc) to string ISO date times to ticktock. Pandas to_datetime() is good for this and could either use theirs or re-implement. Likely re-implement something specific for spacepy straight into TAI.

balarsen avatar Aug 10 '20 23:08 balarsen

Benchmark to start the conversation, @jtniehof run in ipython Uses the pandas version as the standard of comparison

import datetime
import spacepy.toolbox as tb
import dateutil
import numpy as np
import spacepy.time as spt
import pandas as pd
dt = tb.linspace(datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 2), 140000)
# make the array to convert to datetime
dts = np.asarray([v.isoformat() for v in dt])
ticktock_time = %timeit -q -o spt.Ticktock(dts, 'ISO').UTC
pandas_time = %timeit -q -o pd.to_datetime(dts).to_pydatetime()
dateutil_parse_time = %timeit -q -o np.asarray([dateutil.parser.parse(v) for v in dts])
dateutil_isoparse_time = %timeit -q -o np.asarray([dateutil.parser.isoparse(v) for v in dts])
datetime_strptime = %timeit -q -o np.asarray([datetime.datetime.strptime(v, "%Y-%m-%dT%H:%M:%S.%f") for v in dts[1:-1]])

print('dateutil.parse', dateutil_parse_time.average/pandas_time.average)
print('dateutil.isoparse', dateutil_isoparse_time.average/pandas_time.average)
print('ticktock', ticktock_time.average/pandas_time.average)
print('datetime.strptime', datetime_strptime.average/pandas_time.average)
dateutil.parse 90.90662835654132
dateutil.isoparse 11.827291535495116
ticktock 103.34264754764324
datetime.strptime 14.093081268847655

I have not studied how but the pandas version is at https://github.com/pandas-dev/pandas/blob/master/pandas/core/tools/datetimes.py. I think the work is done here: https://github.com/pandas-dev/pandas/blob/83807088329b2a7e6422e0d0ba460870a265d3d2/pandas/_libs/tslibs/conversion.pyx#L376

balarsen avatar Aug 11 '20 02:08 balarsen

Yeah, that's useful. Ticktock does do a lot more work than just parsing to datetime, but once the overhaul's done we can profile and see where the real problem is.

jtniehof avatar Aug 11 '20 20:08 jtniehof

At this point basically all the time goes into strptime calls if the user passes in a consistent format and format string, so that would be the place to focus.

jtniehof avatar Aug 21 '20 15:08 jtniehof

It'll only help folks on Python 3.7 or later, but datetime has added an ISO8601 parser that's apparently much faster. https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat We could use this if available, otherwise strptime...

drsteve avatar Aug 21 '20 16:08 drsteve

We have quite a few fallback cases so it's worth checking out for speed.

jtniehof avatar Aug 21 '20 16:08 jtniehof

Per #471, there is a bit here too easily.

added

numpy_parse = %timeit -q -o dts.astype(np.datetime64).astype(datetime.datetime)
print('datetime64', numpy_parse.average/pandas_time.average)

datetime64 0.8865719603268274

balarsen avatar Jan 05 '21 00:01 balarsen