julia icon indicating copy to clipboard operation
julia copied to clipboard

Regression on number of allocations in GC micro-benchmark

Open d-netto opened this issue 1 year ago • 2 comments

MWE taken from https://github.com/JuliaCI/GCBenchmarks.

using Base.Threads: @threads
using Random: shuffle

function sample_vote!(_rb, chop_counts)
    pts = rand(length(chop_counts))
    N = length(_rb)
    _srt = 4245
    partialsortperm!(_rb, pts, 1:_srt; lt = <, rev = true)
    while sum(@views chop_counts[_rb[1:_srt]]) ≤ 5660
        _srt = min(2 * _srt, N)
        partialsortperm!(_rb, pts, 1:_srt; lt = <, rev = true)
    end
end

function parallel_scores(chop_counts)
    @threads for i in 1:8
        _rb = collect(1:length(chop_counts))
        # the bigger this number, the more % GC time
        for _ ∈ 1:1000
            sample_vote!(_rb, chop_counts)
        end
    end
end

# kind of arbitrary, but approximates my data
chop_counts = shuffle(trunc.(Int, 6500 ./ (50:100_000)))
@time parallel_scores(chop_counts)
  • 1.9: ../julia-1.9/julia -t8 --project=. benches/multithreaded/big_arrays/issue-52937.jl
4.782645 seconds (1.17 M allocations: 29.762 GiB, 15.93% gc time, 39.65% compilation time)
  • master: ../julia-master/julia -t8 --gcthreads=1 --project=. benches/multithreaded/big_arrays/issue-52937.jl
6.554844 seconds (4.42 M allocations: 29.851 GiB, 40.73% gc time, 49.17% compilation time: 47% of which was recompilation)
  • versioninfo:
Julia Version 1.12.0-DEV.209
Commit 22716eb21d (2024-03-14 18:46 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin23.1.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

d-netto avatar Mar 16 '24 01:03 d-netto

1c25d93ca8ab3f5b0cad62 is the cause of 2M alloc -> 4M alloc at least (but it doesn't seem to have that much of a perf impact)

~/julia$ manyjulias $(git rev-parse HEAD) -t8 testit.jl 
Extracted 'julia-1_12_0-DEV_0:1c25d93ca8ab3f5b0cad627d76705fb7025429a3'
  9.980486 seconds (3.81 M allocations: 29.817 GiB, 31.02% gc time, 51.06% compilation time: 74% of which was recompilation)

~/julia$ manyjulias $(git rev-parse HEAD)~1 -t8 testit.jl 
Extracted 'julia-1_12_0-DEV_0:c0a93f8c3ef20fe9f892e1a728409c60599657cc'
  9.533646 seconds (1.99 M allocations: 29.789 GiB, 30.31% gc time, 48.24% compilation time)

It's a perhaps bit surprising that the one with the slower compilation has recompilation while the previous one do not.

cc @Keno

KristofferC avatar Apr 22 '24 13:04 KristofferC

While it is unfortunate with a perf regression I don't think it really warrants being on the milestone.

KristofferC avatar May 08 '24 13:05 KristofferC

the regression still exists on 1.11, but it looks like it's resolved on master, so this would only appear specifically in the 1.11.x releases (unless it re-regresses)

adienes avatar Jul 13 '24 13:07 adienes