django icon indicating copy to clipboard operation
django copied to clipboard

Fixed #????? -- Optimized django.utils.dateformat.Formatter.

Open ngnpope opened this issue 2 years ago • 1 comments

@kezabelle I just noticed that you'd posted #15403. It prompted me that there was something that I started a while back.

The main issues with this that I found where:

  • Splitting the string using the regex followed by multiple calls to re.sub().
  • This is because extra care must be taken for escaped values, but the approach is naive.
    • (See 46346f8ea08020d503b25472a26b949a5ce980a6 for a similar case of escaped values in a regex that I implemented.)
  • There is some funky handling around datetime.date objects not having time-specific formats. I've not fully addressed this yet.

Perhaps you'd like to see how this performs?

ngnpope avatar Feb 06 '22 21:02 ngnpope

I knew I could count on you to keep me on my toes :) Though I didn't expect you to just have a separate PR locked and loaded and ready to go!

In a what I'd consider an absolute rarity, discounting the extra handling of escaping you've got going on (which I've not checked #15403 handles, but will do so this week), the PR I've pushed for consideration has you beat, at least on the same back of the napkin tests I demonstrated in the PR description; here's what I'm seeing for all 3, in the following order:

  • mine
  • yours (copy-pasted)
  • original
Y. \g\a\d\a j. F
5.77 µs ± 49.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
10 µs ± 85.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
12.8 µs ± 62.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

\N\gà\y d \t\há\n\g n \nă\m Y
6.7 µs ± 68 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
13.5 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
20.9 µs ± 385 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

d‏/m‏/Y
5.47 µs ± 48 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
8.48 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
9.52 µs ± 88.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

j\-\a \d\e F YN j, Y, P
11.3 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
19.3 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
26.6 µs ± 339 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

e
1.94 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
5.23 µs ± 87.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.67 µs ± 37.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

j \d\e F \d\e Y \a \l\e\s G:i
8.77 µs ± 54.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
16.7 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
25.3 µs ± 965 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

r
9.17 µs ± 57.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
12.7 µs ± 171 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
9.95 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

U
2.48 µs ± 77.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.63 µs ± 52.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.07 µs ± 55.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

c
3.5 µs ± 81.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
6.76 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.1 µs ± 35 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Z
4.25 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
7.64 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.91 µs ± 50 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

d/m/Y D/M/y
9.31 µs ± 153 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
14.1 µs ± 84.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
21.2 µs ± 369 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In many cases, yours is faster than the current implementation, but #15403 still looks to "win". But yours unfortunately seems to regress on single-character format strings (which, tbf, mine did too [by a bit less, but still] until I special-cased them)

I've not looked into your changes substantially, so it's maybe possible I can bring across the escaping in the other PR ... maybe not. IDK yet.

Naturally I won't say mine is objectively better (or even reproducible beyond my machine!) but I'd definitely appreciate your eyes on it if you ever get the time or interest; the fact you had this already in mind is demonstrable of why I'd be grateful for your input.

kezabelle avatar Feb 06 '22 22:02 kezabelle

Hi @kezabelle 👋🏻

So I finally had a bit more of a play with this and had a faster solution with regular expressions for all cases but the single specifier case, as you'd already pointed out. So I just dispensed with the regular expressions entirely.

There are a number of improvements:

  • Removed unnecessary separation between handling of date and time formats
    • Also deprecated no longer required classes and functions. (Still needs some tests to be written.)
  • Avoided calling format functions from each other in most cases
  • Enforced that all format functions return a string (or lazy string)
  • Used f-strings for their performance improvement over %-interpolated strings
  • Cached various isinstance() calls, etc. up front in the __init__()
  • Simplified handling of timezone checks
  • Various other minor cleanups

This also fixes a few little bugs:

  • Escaped characters are handled properly in more cases, particularly many levels of escaping.
  • Date specifiers are no longer allowed with datetime.time
    • This is a companion to the existing check for time specifiers with datetime.date.
    • I still need to flesh out the tests more to cover datetime.time - this was an existing lack of coverage
  • The r format is now allowed to work with datetime.date, removing an inconsistency

I have some benchmarks for which I received the following nice output:

❯ pyperf compare_to benchmarks/bench_dateformat-37c5b8c07b.json benchmarks/bench_dateformat-patch.json 
Formatter.__init__(): Mean +- std dev: [bench_dateformat-37c5b8c07b] 446 ns +- 8 ns -> [bench_dateformat-patch] 1.31 us +- 0.02 us: 2.93x slower
Formatter.format(): 'Y. \\g\\a\\d\\a j. F': Mean +- std dev: [bench_dateformat-37c5b8c07b] 11.7 us +- 0.2 us -> [bench_dateformat-patch] 5.94 us +- 0.11 us: 1.98x faster
Formatter.format(): '\\N\\gà\\y d \\t\\há\\n\\g n \\nă\\m Y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 12.6 us +- 0.2 us -> [bench_dateformat-patch] 3.30 us +- 0.08 us: 3.82x faster
Formatter.format(): 'd\\u200f/m\\u200f/Y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 11.9 us +- 0.1 us -> [bench_dateformat-patch] 3.54 us +- 0.05 us: 3.36x faster
Formatter.format(): 'j\\-\\a \\d\\e F YN j, Y, P': Mean +- std dev: [bench_dateformat-37c5b8c07b] 27.1 us +- 0.4 us -> [bench_dateformat-patch] 15.8 us +- 0.2 us: 1.71x faster
Formatter.format(): 'e': Mean +- std dev: [bench_dateformat-37c5b8c07b] 1.75 us +- 0.03 us -> [bench_dateformat-patch] 420 ns +- 8 ns: 4.17x faster
Formatter.format(): 'j \\d\\e F \\d\\e Y \\a \\l\\e\\s G:i': Mean +- std dev: [bench_dateformat-37c5b8c07b] 18.4 us +- 0.2 us -> [bench_dateformat-patch] 7.69 us +- 0.12 us: 2.39x faster
Formatter.format(): 'r': Mean +- std dev: [bench_dateformat-37c5b8c07b] 5.21 us +- 0.08 us -> [bench_dateformat-patch] 3.59 us +- 0.05 us: 1.45x faster
Formatter.format(): 'U': Mean +- std dev: [bench_dateformat-37c5b8c07b] 1.96 us +- 0.04 us -> [bench_dateformat-patch] 592 ns +- 10 ns: 3.32x faster
Formatter.format(): 'c': Mean +- std dev: [bench_dateformat-37c5b8c07b] 2.58 us +- 0.04 us -> [bench_dateformat-patch] 1.23 us +- 0.04 us: 2.10x faster
Formatter.format(): 'Z': Mean +- std dev: [bench_dateformat-37c5b8c07b] 2.94 us +- 0.06 us -> [bench_dateformat-patch] 488 ns +- 9 ns: 6.02x faster
Formatter.format(): 'd/m/Y D/M/y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 20.7 us +- 0.3 us -> [bench_dateformat-patch] 10.8 us +- 0.2 us: 1.91x faster
Formatter.__init__() & Formatter.format(): 'Y. \\g\\a\\d\\a j. F': Mean +- std dev: [bench_dateformat-37c5b8c07b] 12.9 us +- 0.4 us -> [bench_dateformat-patch] 7.85 us +- 0.12 us: 1.65x faster
Formatter.__init__() & Formatter.format(): '\\N\\gà\\y d \\t\\há\\n\\g n \\nă\\m Y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 13.9 us +- 0.3 us -> [bench_dateformat-patch] 5.11 us +- 0.07 us: 2.71x faster
Formatter.__init__() & Formatter.format(): 'd\\u200f/m\\u200f/Y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 13.0 us +- 0.3 us -> [bench_dateformat-patch] 5.28 us +- 0.11 us: 2.47x faster
Formatter.__init__() & Formatter.format(): 'j\\-\\a \\d\\e F YN j, Y, P': Mean +- std dev: [bench_dateformat-37c5b8c07b] 28.5 us +- 0.4 us -> [bench_dateformat-patch] 18.0 us +- 0.3 us: 1.58x faster
Formatter.__init__() & Formatter.format(): 'e': Mean +- std dev: [bench_dateformat-37c5b8c07b] 2.28 us +- 0.04 us -> [bench_dateformat-patch] 1.82 us +- 0.03 us: 1.26x faster
Formatter.__init__() & Formatter.format(): 'j \\d\\e F \\d\\e Y \\a \\l\\e\\s G:i': Mean +- std dev: [bench_dateformat-37c5b8c07b] 19.6 us +- 0.4 us -> [bench_dateformat-patch] 9.58 us +- 0.21 us: 2.05x faster
Formatter.__init__() & Formatter.format(): 'r': Mean +- std dev: [bench_dateformat-37c5b8c07b] 5.84 us +- 0.09 us -> [bench_dateformat-patch] 5.32 us +- 0.11 us: 1.10x faster
Formatter.__init__() & Formatter.format(): 'U': Mean +- std dev: [bench_dateformat-37c5b8c07b] 2.51 us +- 0.04 us -> [bench_dateformat-patch] 2.11 us +- 0.06 us: 1.19x faster
Formatter.__init__() & Formatter.format(): 'c': Mean +- std dev: [bench_dateformat-37c5b8c07b] 3.10 us +- 0.06 us -> [bench_dateformat-patch] 2.67 us +- 0.05 us: 1.16x faster
Formatter.__init__() & Formatter.format(): 'Z': Mean +- std dev: [bench_dateformat-37c5b8c07b] 3.42 us +- 0.05 us -> [bench_dateformat-patch] 1.97 us +- 0.03 us: 1.74x faster
Formatter.__init__() & Formatter.format(): 'd/m/Y D/M/y': Mean +- std dev: [bench_dateformat-37c5b8c07b] 21.1 us +- 0.3 us -> [bench_dateformat-patch] 12.6 us +- 0.8 us: 1.68x faster

Geometric mean: 1.92x faster

Would you be able to have a look at this and give an initial review? All broken down in nice individual commits as always 😉

ngnpope avatar Oct 17 '22 21:10 ngnpope

@felixxm I think this is approaching readiness. There are a bunch of extra tests to write for missing coverage, i.e. datetime.time, but I'll look to do that once I split this up into separate tickets and pull requests. What'd be helpful is knowing what sort of breakdown you'd like here. I was going to propose the following:

  • [x] #16423 (No Ticket)
    • Removed redundant calls to django.utils.formats.get_format().
    • Removed default argument from django.utils.dateformat.time_format() usage.
  • [x] #16424 (No Ticket)
    • Modified imports in django.utils.dateformat for performance.
    • Simplified django.utils.dateformat.DateFormat.O().
    • Simplified django.utils.dateformat.DateFormat.r().
    • Simplified django.utils.dateformat.DateFormat.t().
    • Simplified django.utils.dateformat.TimeFormat.e().
    • Simplified django.utils.dateformat ambiguous/imaginary datetime handling.
  • [ ] #16872 (No Ticket)
    • Added multibyte test for django.utils.dateformat.
    • Added extra assertions for escape handling in django.utils.dateformat.
    • ...and add better/missing coverage of datetime.{date,time} here?
    • ...and add better/missing coverage of django.utils.dateformat.time_format() here?
  • [ ] Optimisation of django.utils.dateformat (New Ticket)
    • Refs #????? -- Used f-strings in django.utils.dateformat for performance.
    • Refs #????? -- Made some methods django.utils.dateformat return strings.
    • Refs #????? -- Added specifier constants to django.utils.dateformat.Formatter.
    • Refs #????? -- Migrated specifier methods onto django.utils.datetime.Formatter.
    • Refs #????? -- Migrated __init__() onto django.utils.datetime.Formatter.
    • Refs #????? -- Cached isinstance() checks on django.utils.datetime.Formatter.
    • Refs #????? -- Optimized django.utils.dateformat.Formatter.
    • fixup! Refs #????? -- Optimized django.utils.dateformat.Formatter.
  • [ ] Deprecation of obsolete pieces of django.utils.dateformat (New Ticket)
    • Refs #????? -- Deprecated obsolete API in django.utils.dateformat.
  • [ ] Fixes to formatting of datetime.time objects, etc. (New Ticket)
    • Refs #????? -- Fixed formatting of datetime.time objects.
    • ...and possibly some more work here if stuff is missing.

What do you think? 🙂

ngnpope avatar Oct 28 '22 18:10 ngnpope

What do you think? slightly_smiling_face

Sounds good. We probably don't need separate commits in some of these steps, but I can reorganize them later. I'd start with extra tests.

felixxm avatar Oct 31 '22 05:10 felixxm

@ngnpope Do you have time to keep working on this?

felixxm avatar Feb 09 '23 10:02 felixxm

Was going to revisit it this weekend.

ngnpope avatar Feb 09 '23 14:02 ngnpope

@ngnpope Do you have time to keep working on this? (gently reminder)

felixxm avatar May 19 '23 08:05 felixxm

Closing due to inactivity.

felixxm avatar Mar 21 '24 09:03 felixxm