TimeZones.jl
TimeZones.jl copied to clipboard
Minimize allocations when unpacking `TimeZone`s from cache
- close #422
Even though we're careful about caching the creation of TimeZones and made FixedTimeZone an isbitstype/allocatedinline, we still allocate everytime TimeZone is called, even if just returning a FixedTimeZone from the cache!
I think this happens because the values in the cache are of type TimeZone, i.e. it's not known if we're returning a FixedTimeZone or a VariableTimeZone.
This PR proposes to instead maintain separate caches (per thread) for FixedTimeZones and VariableTimeZones, rather than a single one (per thread) for all TimeZones
Effectively this is some code duplication for a performance improvement.
VariableTimeZone still allocates once, presumably since it's not isbitstype (#271).
Micro benchmarks
FixedTimeZone
This PR
julia> str = "UTC";
julia> TimeZone(str); # compile and cache
julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 987 evaluations.
Range (min … max): 46.733 ns … 156.619 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 53.994 ns ┊ GC (median): 0.00%
Time (mean ± σ): 54.265 ns ± 2.880 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ █▂ █▃
▂▁▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂█▅▄▄▄▅██▅▆▇████▅▄▃▃▃▃▄▄▃▃▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂ ▃
46.7 ns Histogram: frequency by time 61.8 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
master (v1.9.1)
julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 943 evaluations.
Range (min … max): 100.168 ns … 1.490 μs ┊ GC (min … max): 0.00% … 92.27%
Time (median): 104.277 ns ┊ GC (median): 0.00%
Time (mean ± σ): 107.123 ns ± 48.091 ns ┊ GC (mean ± σ): 1.61% ± 3.33%
▁█▆▂
▂▃▃▄▃▄▄▅▅▅▇████▅▄▃▃▃▄▅▅▆▇▇▇▅▅▄▃▃▃▂▂▂▂▂▂▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
100 ns Histogram: frequency by time 117 ns <
Memory estimate: 64 bytes, allocs estimate: 2.
VariableTimeZone
This PR
julia> str = "America/Winnipeg";
julia> TimeZone(str); # compile and cache
julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 946 evaluations.
Range (min … max): 98.441 ns … 1.144 μs ┊ GC (min … max): 0.00% … 90.99%
Time (median): 101.039 ns ┊ GC (median): 0.00%
Time (mean ± σ): 102.185 ns ± 32.794 ns ┊ GC (mean ± σ): 1.01% ± 2.87%
▅▄█▆▁
▁▂▃▄▄▆█▇▆▄▄▄▄▇█████▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
98.4 ns Histogram: frequency by time 108 ns <
Memory estimate: 48 bytes, allocs estimate: 1.
master (v1.9.1)
julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 909 evaluations.
Range (min … max): 119.179 ns … 1.722 μs ┊ GC (min … max): 0.00% … 92.34%
Time (median): 120.187 ns ┊ GC (median): 0.00%
Time (mean ± σ): 123.116 ns ± 47.667 ns ┊ GC (mean ± σ): 1.33% ± 3.17%
▁▅██▇▆▄▃▂▂▁▁▁▁▂▃▃▂▃▃▂▁▁▁▁▁▁▁ ▂
████████████████████████████▇▇███▇▇██▇▇▇▇▇▇▇▇▇█▆▇▅▇▆▆▆▆▆▅▆▅▄ █
119 ns Histogram: log(frequency) by time 134 ns <
Memory estimate: 64 bytes, allocs estimate: 2.
~/r/TimeZones.jl> TZDATA_VERSION=2016j julia --project=benchmark/ -e 'using PkgBenchmark, TimeZones; export_markdown(stdout, judge(TimeZones, "origin/HEAD", verbose=false))'
PkgBenchmark: Running benchmarks...
[ Info: Installing 2016j tzdata region data
[ Info: Converting tz source files into TimeZone data
PkgBenchmark: creating benchmark tuning file /Users/nickr/repos/TimeZones.jl/benchmark/tune.json...
Tuning 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:36
Benchmarking 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:43
PkgBenchmark: Running benchmarks...
[ Info: Installing 2016j tzdata region data
[ Info: Converting tz source files into TimeZone data
PkgBenchmark: using benchmark tuning data in /Users/nickr/repos/TimeZones.jl/benchmark/tune.json
Benchmarking 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:48
Benchmark Report for ~/repos/TimeZones.jl
Job Properties
- Time of benchmarks:
- Target: 14 Dec 2022 - 18:05
- Baseline: 14 Dec 2022 - 18:06
- Package commits:
- Target: 85eb32
- Baseline: f8f39b
- Julia commits:
- Target: 36034a
- Baseline: 36034a
- Julia command flags:
- Target: None
- Baseline: None
- Environment variables:
- Target: None
- Baseline: None
Results
A ratio greater than 1.0 denotes a possible regression (marked with :x:), while a ratio less
than 1.0 denotes a possible improvement (marked with :white_check_mark:). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).
| ID | time ratio | memory ratio |
|---|---|---|
["ZonedDateTime", "local", "standard"] |
0.84 (5%) :white_check_mark: | 1.00 (1%) |
["ZonedDateTime", "range", "VariableTimeZone/DatePeriod"] |
0.91 (5%) :white_check_mark: | 1.00 (1%) |
["arithmetic", "DatePeriod"] |
0.93 (5%) :white_check_mark: | 1.00 (1%) |
["arithmetic", "TimePeriod"] |
1.09 (5%) :x: | 1.00 (1%) |
["arithmetic", "broadcast", "FixedTimeZone/TimePeriod"] |
0.94 (5%) :white_check_mark: | 1.00 (1%) |
["arithmetic", "broadcast", "VariableTimeZone/DatePeriod"] |
0.93 (5%) :white_check_mark: | 1.00 (1%) |
["interpret", "local", "ambiguous"] |
0.87 (5%) :white_check_mark: | 1.00 (1%) |
["interpret", "local", "standard"] |
0.89 (5%) :white_check_mark: | 1.00 (1%) |
["parse", "ISOZonedDateTimeFormat"] |
0.82 (5%) :white_check_mark: | 0.87 (1%) :white_check_mark: |
["parse", "issue#25"] |
0.92 (5%) :white_check_mark: | 0.88 (1%) :white_check_mark: |
["transition_range", "local", "ambiguous"] |
0.83 (5%) :white_check_mark: | 1.00 (1%) |
["transition_range", "local", "non-existent"] |
0.79 (5%) :white_check_mark: | 1.00 (1%) |
["transition_range", "local", "standard"] |
0.81 (5%) :white_check_mark: | 1.00 (1%) |
["transition_range", "utc"] |
1.43 (5%) :x: | 1.00 (1%) |
["tryparsenext_fixedtz", "+06"] |
0.94 (5%) :white_check_mark: | 1.00 (1%) |
["tryparsenext_fixedtz", "+0600"] |
0.94 (5%) :white_check_mark: | 1.00 (1%) |
["tryparsenext_fixedtz", "+06:00"] |
0.95 (5%) :white_check_mark: | 1.00 (1%) |
["tryparsenext_fixedtz", "-06"] |
0.93 (5%) :white_check_mark: | 1.00 (1%) |
["tryparsenext_fixedtz", "-0600"] |
0.95 (5%) :white_check_mark: | 1.00 (1%) |
["tryparsenext_fixedtz", "UTC"] |
0.73 (5%) :white_check_mark: | 1.00 (1%) |
["tryparsenext_fixedtz", "Z"] |
0.56 (5%) :white_check_mark: | 1.00 (1%) |
["tz_data", "parse_components"] |
0.89 (5%) :white_check_mark: | 0.84 (1%) :white_check_mark: |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
["ZonedDateTime"]["ZonedDateTime", "local"]["ZonedDateTime", "range"]["arithmetic"]["arithmetic", "broadcast"]["interpret", "local"]["interpret"]["parse"]["transition_range", "local"]["transition_range"]["tryparsenext_fixedtz"]["tryparsenext_tz"]["tz_data"]
Julia versioninfo
Target
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.4.0)
uname: Darwin 21.4.0 Darwin Kernel Version 21.4.0: Mon Feb 21 20:35:58 PST 2022; root:xnu-8020.101.4~2/RELEASE_ARM64_T6000 x86_64 i386
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1580681 s 0 s 1469385 s 30832641 s 0 s
Memory: 64.0 GB (2506.71875 MB free)
Uptime: 446789.0 sec
Load Avg: 2.73779296875 2.6376953125 2.58837890625
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, westmere)
Threads: 1 on 10 virtual cores
Baseline
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.4.0)
uname: Darwin 21.4.0 Darwin Kernel Version 21.4.0: Mon Feb 21 20:35:58 PST 2022; root:xnu-8020.101.4~2/RELEASE_ARM64_T6000 x86_64 i386
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1581658 s 0 s 1469753 s 30837878 s 0 s
Memory: 64.0 GB (2527.17578125 MB free)
Uptime: 446854.0 sec
Load Avg: 2.205078125 2.5078125 2.54345703125
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, westmere)
Threads: 1 on 10 virtual cores
Re-ran the benchmarks and the ones that are ❌ above are :white_check_mark: on re-run (and others are now not significant), so i think they're very noisy? Especially as most of these don't look like they even hit the codepaths changed in this PR?
Anyway, from this doesn't seem like we can say the two caches cause any performance issue
I'm not sure i understand why the VariableTimeZones still alloc when you look them up. I agree with you it is unexpected.. I'm not sure the isbits thing should matter, since we're pointing to the existing vector, not allocating a new one, and Julia is supposed to be able to stack-allocate an immutable struct that contains mutable fields, since like 1.4 or something. So i'm not sure i understand yet; it might be good to look through a profile together, once the rest of the issues in the PR are addressed.
Okay, i've spent a couple days playing around with different options but i can't yet see how to remove the remaining VariableTimeZones allocation, so i propose leaving that as a follow-up. And likewise i don't think a switch to MultiThreadedCaches.jl is straight forward, so would prefer to get this perf improvement in and leave that refactor to a follow-up.
@NHDaly and @omus please could you take another look at this PR? Thanks!
I'll try to take a look soon. I'll call out that I also want to get #382 merged soon as well.
Codecov Report
Merging #423 (3218a4e) into master (804864c) will decrease coverage by
0.47%. The diff coverage is98.41%.
:exclamation: Current head 3218a4e differs from pull request most recent head b158178. Consider uploading reports for the commit b158178 to get more accurate results
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
@@ Coverage Diff @@
## master #423 +/- ##
==========================================
- Coverage 95.58% 95.11% -0.47%
==========================================
Files 38 36 -2
Lines 1766 1803 +37
==========================================
+ Hits 1688 1715 +27
- Misses 78 88 +10
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/TimeZones.jl | 100.00% <ø> (ø) |
|
| src/types/timezone.jl | 98.64% <98.36%> (+1.50%) |
:arrow_up: |
| src/tzdata/compile.jl | 94.78% <100.00%> (-0.97%) |
:arrow_down: |
I've been a bit swamped this week. Overall, I'm in favour of this change and will definitely merge this before #382
Just following up to make sure this doesn't drop off your radar. Let me know if i can do anything to help, @omus :)
It turned out #382 became urgent to merge for Julia 1.9 so the original plan had changed. If @nickrobinson251 you have time to rebase this against the latest version of the code that would be great. If not, I'll try to adapt this code myself sometime.
Is this just a (simple) rebase away? It seems good as is, down to 0 allocations is great, already an improvement. Getting rid of the 1 remaining alloc for VariableTimeZone would be ideal, but shouldn't stop from merging as is? I didn't look closely into why it's there, maybe fixed already on 1.10?
I think this is just a (not-very-simple) rebase away. And I would love, love, love someone to take that on, as I don't have capacity at the moment -- please feel encouraged to do so!
Superseded by #451