Add fast backend for ISO to ticktock input
Feature request to look at after #390. I would love to see a fast backend (C, numba, etc) to string ISO date times to ticktock. Pandas to_datetime() is good for this and could either use theirs or re-implement. Likely re-implement something specific for spacepy straight into TAI.
Benchmark to start the conversation, @jtniehof
run in ipython
Uses the pandas version as the standard of comparison
import datetime
import spacepy.toolbox as tb
import dateutil
import numpy as np
import spacepy.time as spt
import pandas as pd
dt = tb.linspace(datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 2), 140000)
# make the array to convert to datetime
dts = np.asarray([v.isoformat() for v in dt])
ticktock_time = %timeit -q -o spt.Ticktock(dts, 'ISO').UTC
pandas_time = %timeit -q -o pd.to_datetime(dts).to_pydatetime()
dateutil_parse_time = %timeit -q -o np.asarray([dateutil.parser.parse(v) for v in dts])
dateutil_isoparse_time = %timeit -q -o np.asarray([dateutil.parser.isoparse(v) for v in dts])
datetime_strptime = %timeit -q -o np.asarray([datetime.datetime.strptime(v, "%Y-%m-%dT%H:%M:%S.%f") for v in dts[1:-1]])
print('dateutil.parse', dateutil_parse_time.average/pandas_time.average)
print('dateutil.isoparse', dateutil_isoparse_time.average/pandas_time.average)
print('ticktock', ticktock_time.average/pandas_time.average)
print('datetime.strptime', datetime_strptime.average/pandas_time.average)
dateutil.parse 90.90662835654132
dateutil.isoparse 11.827291535495116
ticktock 103.34264754764324
datetime.strptime 14.093081268847655
I have not studied how but the pandas version is at https://github.com/pandas-dev/pandas/blob/master/pandas/core/tools/datetimes.py. I think the work is done here: https://github.com/pandas-dev/pandas/blob/83807088329b2a7e6422e0d0ba460870a265d3d2/pandas/_libs/tslibs/conversion.pyx#L376
Yeah, that's useful. Ticktock does do a lot more work than just parsing to datetime, but once the overhaul's done we can profile and see where the real problem is.
At this point basically all the time goes into strptime calls if the user passes in a consistent format and format string, so that would be the place to focus.
It'll only help folks on Python 3.7 or later, but datetime has added an ISO8601 parser that's apparently much faster.
https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat
We could use this if available, otherwise strptime...
We have quite a few fallback cases so it's worth checking out for speed.
Per #471, there is a bit here too easily.
added
numpy_parse = %timeit -q -o dts.astype(np.datetime64).astype(datetime.datetime)
print('datetime64', numpy_parse.average/pandas_time.average)
datetime64 0.8865719603268274