django icon indicating copy to clipboard operation
django copied to clipboard

WIP: Optimize dateformat Formatters

Open kezabelle opened this issue 2 years ago • 0 comments

This is in no way finished, and the code styling and formatting are surely going to get some suggestions and consideration[^1], but here we go.

django.utils.dateformat.Formatter.format has been replaced almost completely. The previous version is currently in the class as format_old for the purposes of allowing comparison of various format strings I might not have thought of, for regressions etc.

I've captured all the format strings exercised by the test suite, and plucked a few that look both complex and simple, to try and gather a decent range of uses. I have also run all the format strings in the test suite through a bunch of ipython %timeit runs in a loop, which isn't exactly a thorough and stable benchmark, but they all looked promising enough. Takes long enough and produces enough output that whilst I can do it again and attach the output, it would be y'know, effort ;)

Example IPython session:

tests = [
    "Y. \\g\\a\\d\\a j. F",
    "\\N\\gà\\y d \\t\\há\\n\\g n \\nă\\m Y",
    "d\u200f/m\u200f/Y",
    "j\\-\\a \\d\\e F Y" "N j, Y, P",
    "e",
    "j \\d\\e F \\d\\e Y \\a \\l\\e\\s G:i",
    "r",
    "U",
    "c",
    "Z",
    "d/m/Y D/M/y",
]
from datetime import time, date, datetime, timezone
from django.utils.dateformat import DateFormat
dt = datetime.now().replace(tzinfo=timezone.utc)
df = DateFormat(dt)
for t in tests:
    print(t)
    %timeit df.format(t)
    %timeit df.format_old(t)

which gives me, on Python 3.10.1 the following times:

Y. \g\a\d\a j. F
7.17 µs ± 53.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
16.1 µs ± 93.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

\N\gà\y d \t\há\n\g n \nă\m Y
8.36 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
25.8 µs ± 609 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

d‏/m‏/Y
6.69 µs ± 77.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
11.9 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

j\-\a \d\e F YN j, Y, P
14 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
32.7 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

e
2.32 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.22 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

j \d\e F \d\e Y \a \l\e\s G:i
10.8 µs ± 267 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
30.2 µs ± 805 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

r
11.1 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
12 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

U
3.04 µs ± 60.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.67 µs ± 65.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

c
4.26 µs ± 46.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.96 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Z
5.02 µs ± 60.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.54 µs ± 219 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

d/m/Y D/M/y
9.96 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
22.4 µs ± 739 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I had to add in special-casing for single-character format strings (Z, U etc) to get them to not perform worse, and there may be issues not covered by the tests or my thinking-through of things[^2]. Tests should pass though.

So, have at it. Find regressions, or cases it doesn't handle, or versions of Python where it does worse.

[^1]: for example, I could hoist time_formats and date_formats to module/global level (slower?) but use those constants for constructing the re_formatchars [^2]: is there a way to perhaps not have to check type(...) is date and in time_formats in every loop? I couldn't think of a way off the top of my head that didn't introduce another loop, which'd be slower I imagine.

kezabelle avatar Feb 06 '22 19:02 kezabelle