julia [rfc] parallel marking

This PR extends the refactoring from https://github.com/JuliaLang/julia/pull/45608 by parallelizing the GC mark-loop.

Some benchmark results are here: https://hackmd.io/@Idnmfpb3SxK98-OsBtRD5A/HJn86Bl25

TODO

[x] Move work-stealing queue implementation to separate file.
[x] Run more benchmarks (and test scalability for large number of threads).
[x] Fix GC debugging infrastructure & annotations.

CC: @vchuravy, @chflood, @kpamnany

Jun 10 '22 19:06 d-netto

can we get 32 core benchmarks as well? It would be very unfortunate if this was a regression for cpus with many cores. (it would be fine to limit GC to use 4 if thats all it can use, but negative scaling is very unfortunate)

Jun 10 '22 20:06 oscardssmith

(EDIT: see scaling plots below)

Jun 17 '22 19:06 d-netto

Were these timings made on a version of Julia that includes the changes https://github.com/JuliaLang/julia/pull/45714 that make the GC time report correctly?

Jun 17 '22 19:06 oscardssmith

No. Will merge and update that.

Jun 17 '22 19:06 d-netto

The squash went pretty bad?

Jul 14 '22 15:07 giordano

yeah it did, hopefully fixed in the latest commit

Jul 14 '22 15:07 d-netto

Latest commits implement @vchuravy and @chflood's idea of batching big arrays (bugs are mine). Seems to be working fine on my machine with the many_refs benchmark recently added to GCBenchmarks. Might need more stress-testing though.

Jul 20 '22 16:07 d-netto

@nanosoldier runtests(ALL, vs = ":master", buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"], vs_buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"])

Aug 02 '22 13:08 vchuravy

@nanosoldier runbenchmarks(!"scalar", vs=":master")

Aug 02 '22 13:08 vchuravy

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo -n /nanosoldier/cset/bin/cset shield -e -- sudo -n -u nanosoldier-worker -- /nanosoldier/workdir/jl_Ru6OqQ/benchscript.sh`, ProcessSignaled(6)) [0]

Logs and partial data can be found here

Aug 02 '22 14:08 nanosoldier

@nanosoldier runbenchmarks(!"scalar", vs=":master")

Aug 03 '22 20:08 vchuravy

@nanosoldier runbenchmarks(!"scalar", vs=":master")

Aug 05 '22 19:08 d-netto

Looks like there are a couple more key packages that need fixing from glancing at some parts of that report, including FFTW, HTTP, and Parsers. The HTTP and Parsers code looks like it was probably already very dangerous (looks like it probably may be introducing data races into the code). The FFTW code is just trivially deciding if thread support is enabled, and that call can be removed: https://github.com/JuliaMath/FFTW.jl/blob/17bc81a0fcf9875d777ea4bee2fca70fc23c8a0c/src/providers.jl#L75

Aug 06 '22 05:08 vtjnash

Were there any regressions? Can you post the before/after?

Aug 25 '22 20:08 oscardssmith

@nanosoldier runtests(ALL, vs = ":master", buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"], vs_buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"])

Sep 09 '22 12:09 vchuravy

What is the status of this now that https://github.com/JuliaLang/julia/pull/21590 was merged?

Feb 02 '23 20:02 vtjnash

I think it's not ready to merge/review yet. It's still causing regressions on some of the GCBenchmarks and I'm diagnosing that (will post some scaling plots in a sec).

I think @kpamnany also saw some segfaults on RAI tests.

Feb 02 '23 20:02 d-netto

Speedup/slowdowns on GC times relative to the master commit from which it branched vs nthreads (1, 2, 4, 8, 16):

positive-gc-scaling neutral-gc-scaling negative-gc-scaling

Feb 02 '23 20:02 d-netto

Superseded by https://github.com/JuliaLang/julia/pull/48600.

Feb 19 '23 02:02 d-netto

julia julia copied to clipboard

[rfc] parallel marking

TODO

julia
julia copied to clipboard