julia
julia copied to clipboard
Regression on number of allocations in GC micro-benchmark
MWE taken from https://github.com/JuliaCI/GCBenchmarks.
using Base.Threads: @threads
using Random: shuffle
function sample_vote!(_rb, chop_counts)
pts = rand(length(chop_counts))
N = length(_rb)
_srt = 4245
partialsortperm!(_rb, pts, 1:_srt; lt = <, rev = true)
while sum(@views chop_counts[_rb[1:_srt]]) ≤ 5660
_srt = min(2 * _srt, N)
partialsortperm!(_rb, pts, 1:_srt; lt = <, rev = true)
end
end
function parallel_scores(chop_counts)
@threads for i in 1:8
_rb = collect(1:length(chop_counts))
# the bigger this number, the more % GC time
for _ ∈ 1:1000
sample_vote!(_rb, chop_counts)
end
end
end
# kind of arbitrary, but approximates my data
chop_counts = shuffle(trunc.(Int, 6500 ./ (50:100_000)))
@time parallel_scores(chop_counts)
- 1.9:
../julia-1.9/julia -t8 --project=. benches/multithreaded/big_arrays/issue-52937.jl
4.782645 seconds (1.17 M allocations: 29.762 GiB, 15.93% gc time, 39.65% compilation time)
- master:
../julia-master/julia -t8 --gcthreads=1 --project=. benches/multithreaded/big_arrays/issue-52937.jl
6.554844 seconds (4.42 M allocations: 29.851 GiB, 40.73% gc time, 49.17% compilation time: 47% of which was recompilation)
- versioninfo:
Julia Version 1.12.0-DEV.209
Commit 22716eb21d (2024-03-14 18:46 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin23.1.0)
CPU: 12 × Apple M2 Max
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
1c25d93ca8ab3f5b0cad62 is the cause of 2M alloc -> 4M alloc at least (but it doesn't seem to have that much of a perf impact)
~/julia$ manyjulias $(git rev-parse HEAD) -t8 testit.jl
Extracted 'julia-1_12_0-DEV_0:1c25d93ca8ab3f5b0cad627d76705fb7025429a3'
9.980486 seconds (3.81 M allocations: 29.817 GiB, 31.02% gc time, 51.06% compilation time: 74% of which was recompilation)
~/julia$ manyjulias $(git rev-parse HEAD)~1 -t8 testit.jl
Extracted 'julia-1_12_0-DEV_0:c0a93f8c3ef20fe9f892e1a728409c60599657cc'
9.533646 seconds (1.99 M allocations: 29.789 GiB, 30.31% gc time, 48.24% compilation time)
It's a perhaps bit surprising that the one with the slower compilation has recompilation while the previous one do not.
cc @Keno
While it is unfortunate with a perf regression I don't think it really warrants being on the milestone.
the regression still exists on 1.11, but it looks like it's resolved on master, so this would only appear specifically in the 1.11.x releases (unless it re-regresses)