django
django copied to clipboard
WIP: Optimize dateformat Formatters
This is in no way finished, and the code styling and formatting are surely going to get some suggestions and consideration[^1], but here we go.
django.utils.dateformat.Formatter.format
has been replaced almost completely. The previous version is currently in the class as format_old
for the purposes of allowing comparison of various format strings I might not have thought of, for regressions etc.
I've captured all the format strings exercised by the test suite, and plucked a few that look both complex and simple, to try and gather a decent range of uses. I have also run all the format strings in the test suite through a bunch of ipython %timeit
runs in a loop, which isn't exactly a thorough and stable benchmark, but they all looked promising enough. Takes long enough and produces enough output that whilst I can do it again and attach the output, it would be y'know, effort ;)
Example IPython session:
tests = [
"Y. \\g\\a\\d\\a j. F",
"\\N\\gà\\y d \\t\\há\\n\\g n \\nă\\m Y",
"d\u200f/m\u200f/Y",
"j\\-\\a \\d\\e F Y" "N j, Y, P",
"e",
"j \\d\\e F \\d\\e Y \\a \\l\\e\\s G:i",
"r",
"U",
"c",
"Z",
"d/m/Y D/M/y",
]
from datetime import time, date, datetime, timezone
from django.utils.dateformat import DateFormat
dt = datetime.now().replace(tzinfo=timezone.utc)
df = DateFormat(dt)
for t in tests:
print(t)
%timeit df.format(t)
%timeit df.format_old(t)
which gives me, on Python 3.10.1
the following times:
Y. \g\a\d\a j. F
7.17 µs ± 53.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
16.1 µs ± 93.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
\N\gà\y d \t\há\n\g n \nă\m Y
8.36 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
25.8 µs ± 609 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
d/m/Y
6.69 µs ± 77.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
11.9 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
j\-\a \d\e F YN j, Y, P
14 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
32.7 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
e
2.32 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.22 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
j \d\e F \d\e Y \a \l\e\s G:i
10.8 µs ± 267 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
30.2 µs ± 805 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
r
11.1 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
12 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
U
3.04 µs ± 60.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.67 µs ± 65.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
c
4.26 µs ± 46.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.96 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Z
5.02 µs ± 60.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.54 µs ± 219 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
d/m/Y D/M/y
9.96 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
22.4 µs ± 739 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
I had to add in special-casing for single-character format strings (Z
, U
etc) to get them to not perform worse, and there may be issues not covered by the tests or my thinking-through of things[^2]. Tests should pass though.
So, have at it. Find regressions, or cases it doesn't handle, or versions of Python where it does worse.
[^1]: for example, I could hoist time_formats
and date_formats
to module/global level (slower?) but use those constants for constructing the re_formatchars
[^2]: is there a way to perhaps not have to check type(...) is date and in time_formats
in every loop? I couldn't think of a way off the top of my head that didn't introduce another loop, which'd be slower I imagine.