julia icon indicating copy to clipboard operation
julia copied to clipboard

[rfc] parallel marking

Open d-netto opened this issue 2 years ago • 15 comments

This PR extends the refactoring from https://github.com/JuliaLang/julia/pull/45608 by parallelizing the GC mark-loop.

Some benchmark results are here: https://hackmd.io/@Idnmfpb3SxK98-OsBtRD5A/HJn86Bl25

TODO

  • [x] Move work-stealing queue implementation to separate file.
  • [x] Run more benchmarks (and test scalability for large number of threads).
  • [x] Fix GC debugging infrastructure & annotations.

CC: @vchuravy, @chflood, @kpamnany

d-netto avatar Jun 10 '22 19:06 d-netto

can we get 32 core benchmarks as well? It would be very unfortunate if this was a regression for cpus with many cores. (it would be fine to limit GC to use 4 if thats all it can use, but negative scaling is very unfortunate)

oscardssmith avatar Jun 10 '22 20:06 oscardssmith

(EDIT: see scaling plots below)

d-netto avatar Jun 17 '22 19:06 d-netto

Were these timings made on a version of Julia that includes the changes https://github.com/JuliaLang/julia/pull/45714 that make the GC time report correctly?

oscardssmith avatar Jun 17 '22 19:06 oscardssmith

No. Will merge and update that.

d-netto avatar Jun 17 '22 19:06 d-netto

The squash went pretty bad?

giordano avatar Jul 14 '22 15:07 giordano

yeah it did, hopefully fixed in the latest commit

d-netto avatar Jul 14 '22 15:07 d-netto

Latest commits implement @vchuravy and @chflood's idea of batching big arrays (bugs are mine). Seems to be working fine on my machine with the many_refs benchmark recently added to GCBenchmarks. Might need more stress-testing though.

d-netto avatar Jul 20 '22 16:07 d-netto

@nanosoldier runtests(ALL, vs = ":master", buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"], vs_buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"])

vchuravy avatar Aug 02 '22 13:08 vchuravy

@nanosoldier runbenchmarks(!"scalar", vs=":master")

vchuravy avatar Aug 02 '22 13:08 vchuravy

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo -n /nanosoldier/cset/bin/cset shield -e -- sudo -n -u nanosoldier-worker -- /nanosoldier/workdir/jl_Ru6OqQ/benchscript.sh`, ProcessSignaled(6)) [0]

Logs and partial data can be found here

nanosoldier avatar Aug 02 '22 14:08 nanosoldier

@nanosoldier runbenchmarks(!"scalar", vs=":master")

vchuravy avatar Aug 03 '22 20:08 vchuravy

@nanosoldier runbenchmarks(!"scalar", vs=":master")

d-netto avatar Aug 05 '22 19:08 d-netto

Looks like there are a couple more key packages that need fixing from glancing at some parts of that report, including FFTW, HTTP, and Parsers. The HTTP and Parsers code looks like it was probably already very dangerous (looks like it probably may be introducing data races into the code). The FFTW code is just trivially deciding if thread support is enabled, and that call can be removed: https://github.com/JuliaMath/FFTW.jl/blob/17bc81a0fcf9875d777ea4bee2fca70fc23c8a0c/src/providers.jl#L75

vtjnash avatar Aug 06 '22 05:08 vtjnash

Were there any regressions? Can you post the before/after?

oscardssmith avatar Aug 25 '22 20:08 oscardssmith

@nanosoldier runtests(ALL, vs = ":master", buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"], vs_buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"])

vchuravy avatar Sep 09 '22 12:09 vchuravy

What is the status of this now that https://github.com/JuliaLang/julia/pull/21590 was merged?

vtjnash avatar Feb 02 '23 20:02 vtjnash

I think it's not ready to merge/review yet. It's still causing regressions on some of the GCBenchmarks and I'm diagnosing that (will post some scaling plots in a sec).

I think @kpamnany also saw some segfaults on RAI tests.

d-netto avatar Feb 02 '23 20:02 d-netto

Speedup/slowdowns on GC times relative to the master commit from which it branched vs nthreads (1, 2, 4, 8, 16):

positive-gc-scaling neutral-gc-scaling negative-gc-scaling

d-netto avatar Feb 02 '23 20:02 d-netto

Superseded by https://github.com/JuliaLang/julia/pull/48600.

d-netto avatar Feb 19 '23 02:02 d-netto