django
django copied to clipboard
Fixed #????? -- Optimized django.utils.dateformat.Formatter.
@kezabelle I just noticed that you'd posted #15403. It prompted me that there was something that I started a while back.
The main issues with this that I found where:
- Splitting the string using the regex followed by multiple calls to
re.sub()
. - This is because extra care must be taken for escaped values, but the approach is naive.
- (See 46346f8ea08020d503b25472a26b949a5ce980a6 for a similar case of escaped values in a regex that I implemented.)
- There is some funky handling around
datetime.date
objects not having time-specific formats. I've not fully addressed this yet.
Perhaps you'd like to see how this performs?
I knew I could count on you to keep me on my toes :) Though I didn't expect you to just have a separate PR locked and loaded and ready to go!
In a what I'd consider an absolute rarity, discounting the extra handling of escaping you've got going on (which I've not checked #15403 handles, but will do so this week), the PR I've pushed for consideration has you beat, at least on the same back of the napkin tests I demonstrated in the PR description; here's what I'm seeing for all 3, in the following order:
- mine
- yours (copy-pasted)
- original
Y. \g\a\d\a j. F
5.77 µs ± 49.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
10 µs ± 85.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
12.8 µs ± 62.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
\N\gà\y d \t\há\n\g n \nă\m Y
6.7 µs ± 68 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
13.5 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
20.9 µs ± 385 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
d/m/Y
5.47 µs ± 48 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
8.48 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
9.52 µs ± 88.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
j\-\a \d\e F YN j, Y, P
11.3 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
19.3 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
26.6 µs ± 339 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
e
1.94 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
5.23 µs ± 87.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.67 µs ± 37.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
j \d\e F \d\e Y \a \l\e\s G:i
8.77 µs ± 54.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
16.7 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
25.3 µs ± 965 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
r
9.17 µs ± 57.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
12.7 µs ± 171 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
9.95 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
U
2.48 µs ± 77.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.63 µs ± 52.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.07 µs ± 55.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
c
3.5 µs ± 81.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
6.76 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.1 µs ± 35 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Z
4.25 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
7.64 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.91 µs ± 50 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
d/m/Y D/M/y
9.31 µs ± 153 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
14.1 µs ± 84.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
21.2 µs ± 369 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In many cases, yours is faster than the current implementation, but #15403 still looks to "win". But yours unfortunately seems to regress on single-character format strings (which, tbf, mine did too [by a bit less, but still] until I special-cased them)
I've not looked into your changes substantially, so it's maybe possible I can bring across the escaping in the other PR ... maybe not. IDK yet.
Naturally I won't say mine is objectively better (or even reproducible beyond my machine!) but I'd definitely appreciate your eyes on it if you ever get the time or interest; the fact you had this already in mind is demonstrable of why I'd be grateful for your input.
Hi @kezabelle 👋🏻
So I finally had a bit more of a play with this and had a faster solution with regular expressions for all cases but the single specifier case, as you'd already pointed out. So I just dispensed with the regular expressions entirely.
There are a number of improvements:
- Removed unnecessary separation between handling of date and time formats
- Also deprecated no longer required classes and functions. (Still needs some tests to be written.)
- Avoided calling format functions from each other in most cases
- Enforced that all format functions return a string (or lazy string)
- Used f-strings for their performance improvement over %-interpolated strings
- Cached various
isinstance()
calls, etc. up front in the__init__()
- Simplified handling of timezone checks
- Various other minor cleanups
This also fixes a few little bugs:
- Escaped characters are handled properly in more cases, particularly many levels of escaping.
- Date specifiers are no longer allowed with
datetime.time
-
This is a companion to the existing check for time specifiers with
datetime.date
. -
I still need to flesh out the tests more to cover
datetime.time
- this was an existing lack of coverage
-
This is a companion to the existing check for time specifiers with
- The
r
format is now allowed to work withdatetime.date
, removing an inconsistency
I have some benchmarks for which I received the following nice output:
❯ pyperf compare_to benchmarks/bench_dateformat-37c5b8c07b.json benchmarks/bench_dateformat-patch.json
Formatter.__init__(): Mean +- std dev: [bench_dateformat-37c5b8c07b] 446 ns +- 8 ns -> [bench_dateformat-patch] 1.31 us +- 0.02 us: 2.93x slower
Formatter.format(): 'Y. \\g\\a\\d\\a j. F': Mean +- std dev: [bench_dateformat-37c5b8c07b] 11.7 us +- 0.2 us -> [bench_dateformat-patch] 5.94 us +- 0.11 us: 1.98x faster
Formatter.format(): '\\N\\gà\\y d \\t\\há\\n\\g n \\nă\\m Y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 12.6 us +- 0.2 us -> [bench_dateformat-patch] 3.30 us +- 0.08 us: 3.82x faster
Formatter.format(): 'd\\u200f/m\\u200f/Y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 11.9 us +- 0.1 us -> [bench_dateformat-patch] 3.54 us +- 0.05 us: 3.36x faster
Formatter.format(): 'j\\-\\a \\d\\e F YN j, Y, P': Mean +- std dev: [bench_dateformat-37c5b8c07b] 27.1 us +- 0.4 us -> [bench_dateformat-patch] 15.8 us +- 0.2 us: 1.71x faster
Formatter.format(): 'e': Mean +- std dev: [bench_dateformat-37c5b8c07b] 1.75 us +- 0.03 us -> [bench_dateformat-patch] 420 ns +- 8 ns: 4.17x faster
Formatter.format(): 'j \\d\\e F \\d\\e Y \\a \\l\\e\\s G:i': Mean +- std dev: [bench_dateformat-37c5b8c07b] 18.4 us +- 0.2 us -> [bench_dateformat-patch] 7.69 us +- 0.12 us: 2.39x faster
Formatter.format(): 'r': Mean +- std dev: [bench_dateformat-37c5b8c07b] 5.21 us +- 0.08 us -> [bench_dateformat-patch] 3.59 us +- 0.05 us: 1.45x faster
Formatter.format(): 'U': Mean +- std dev: [bench_dateformat-37c5b8c07b] 1.96 us +- 0.04 us -> [bench_dateformat-patch] 592 ns +- 10 ns: 3.32x faster
Formatter.format(): 'c': Mean +- std dev: [bench_dateformat-37c5b8c07b] 2.58 us +- 0.04 us -> [bench_dateformat-patch] 1.23 us +- 0.04 us: 2.10x faster
Formatter.format(): 'Z': Mean +- std dev: [bench_dateformat-37c5b8c07b] 2.94 us +- 0.06 us -> [bench_dateformat-patch] 488 ns +- 9 ns: 6.02x faster
Formatter.format(): 'd/m/Y D/M/y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 20.7 us +- 0.3 us -> [bench_dateformat-patch] 10.8 us +- 0.2 us: 1.91x faster
Formatter.__init__() & Formatter.format(): 'Y. \\g\\a\\d\\a j. F': Mean +- std dev: [bench_dateformat-37c5b8c07b] 12.9 us +- 0.4 us -> [bench_dateformat-patch] 7.85 us +- 0.12 us: 1.65x faster
Formatter.__init__() & Formatter.format(): '\\N\\gà\\y d \\t\\há\\n\\g n \\nă\\m Y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 13.9 us +- 0.3 us -> [bench_dateformat-patch] 5.11 us +- 0.07 us: 2.71x faster
Formatter.__init__() & Formatter.format(): 'd\\u200f/m\\u200f/Y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 13.0 us +- 0.3 us -> [bench_dateformat-patch] 5.28 us +- 0.11 us: 2.47x faster
Formatter.__init__() & Formatter.format(): 'j\\-\\a \\d\\e F YN j, Y, P': Mean +- std dev: [bench_dateformat-37c5b8c07b] 28.5 us +- 0.4 us -> [bench_dateformat-patch] 18.0 us +- 0.3 us: 1.58x faster
Formatter.__init__() & Formatter.format(): 'e': Mean +- std dev: [bench_dateformat-37c5b8c07b] 2.28 us +- 0.04 us -> [bench_dateformat-patch] 1.82 us +- 0.03 us: 1.26x faster
Formatter.__init__() & Formatter.format(): 'j \\d\\e F \\d\\e Y \\a \\l\\e\\s G:i': Mean +- std dev: [bench_dateformat-37c5b8c07b] 19.6 us +- 0.4 us -> [bench_dateformat-patch] 9.58 us +- 0.21 us: 2.05x faster
Formatter.__init__() & Formatter.format(): 'r': Mean +- std dev: [bench_dateformat-37c5b8c07b] 5.84 us +- 0.09 us -> [bench_dateformat-patch] 5.32 us +- 0.11 us: 1.10x faster
Formatter.__init__() & Formatter.format(): 'U': Mean +- std dev: [bench_dateformat-37c5b8c07b] 2.51 us +- 0.04 us -> [bench_dateformat-patch] 2.11 us +- 0.06 us: 1.19x faster
Formatter.__init__() & Formatter.format(): 'c': Mean +- std dev: [bench_dateformat-37c5b8c07b] 3.10 us +- 0.06 us -> [bench_dateformat-patch] 2.67 us +- 0.05 us: 1.16x faster
Formatter.__init__() & Formatter.format(): 'Z': Mean +- std dev: [bench_dateformat-37c5b8c07b] 3.42 us +- 0.05 us -> [bench_dateformat-patch] 1.97 us +- 0.03 us: 1.74x faster
Formatter.__init__() & Formatter.format(): 'd/m/Y D/M/y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 21.1 us +- 0.3 us -> [bench_dateformat-patch] 12.6 us +- 0.8 us: 1.68x faster
Geometric mean: 1.92x faster
Would you be able to have a look at this and give an initial review? All broken down in nice individual commits as always 😉
@felixxm I think this is approaching readiness. There are a bunch of extra tests to write for missing coverage, i.e. datetime.time
, but I'll look to do that once I split this up into separate tickets and pull requests. What'd be helpful is knowing what sort of breakdown you'd like here. I was going to propose the following:
- [x] #16423 (No Ticket)
- Removed redundant calls to
django.utils.formats.get_format()
. - Removed default argument from
django.utils.dateformat.time_format()
usage.
- Removed redundant calls to
- [x] #16424 (No Ticket)
- Modified imports in
django.utils.dateformat
for performance. - Simplified
django.utils.dateformat.DateFormat.O()
. - Simplified
django.utils.dateformat.DateFormat.r()
. - Simplified
django.utils.dateformat.DateFormat.t()
. - Simplified
django.utils.dateformat.TimeFormat.e()
. - Simplified
django.utils.dateformat
ambiguous/imaginary datetime handling.
- Modified imports in
- [ ] #16872 (No Ticket)
- Added multibyte test for
django.utils.dateformat
. - Added extra assertions for escape handling in
django.utils.dateformat
. -
...and add better/missing coverage of
datetime.{date,time}
here? -
...and add better/missing coverage of
django.utils.dateformat.time_format()
here?
- Added multibyte test for
- [ ] Optimisation of
django.utils.dateformat
(New Ticket)- Refs #????? -- Used f-strings in
django.utils.dateformat
for performance. - Refs #????? -- Made some methods
django.utils.dateformat
return strings. - Refs #????? -- Added specifier constants to
django.utils.dateformat.Formatter
. - Refs #????? -- Migrated specifier methods onto
django.utils.datetime.Formatter
. - Refs #????? -- Migrated
__init__()
ontodjango.utils.datetime.Formatter
. - Refs #????? -- Cached
isinstance()
checks ondjango.utils.datetime.Formatter
. - Refs #????? -- Optimized django.utils.dateformat.Formatter.
- fixup! Refs #????? -- Optimized django.utils.dateformat.Formatter.
- Refs #????? -- Used f-strings in
- [ ] Deprecation of obsolete pieces of
django.utils.dateformat
(New Ticket)- Refs #????? -- Deprecated obsolete API in
django.utils.dateformat
.
- Refs #????? -- Deprecated obsolete API in
- [ ] Fixes to formatting of
datetime.time
objects, etc. (New Ticket)- Refs #????? -- Fixed formatting of
datetime.time
objects. - ...and possibly some more work here if stuff is missing.
- Refs #????? -- Fixed formatting of
What do you think? 🙂
What do you think? slightly_smiling_face
Sounds good. We probably don't need separate commits in some of these steps, but I can reorganize them later. I'd start with extra tests.
@ngnpope Do you have time to keep working on this?
Was going to revisit it this weekend.
@ngnpope Do you have time to keep working on this? (gently reminder)
Closing due to inactivity.