Polyester.jl
Polyester.jl copied to clipboard
Weak and strong scaling tests
Hello.
Could you please comment on the existence of weak and strong scaling tests withing CheapThreads.jl? Would it be useful to implement such a thing? Of course this type of tests strongly depends on the problem that is simulated and also on its implementation details. I think, however, that it could be a nice way for potential users to discover the quality and usefulness of CheapThreads.
Best, PS: sorry if this is not the right place for such questions. PS: For more info on weak and strong scaling: see this link for instance.
No one's written any tests, but you could try an example like from that link, using @batch to parallelize.
Note that the results will be heavily problem dependent. E.g., if the operation is primarily memory bound, then scaling will be bad.
Depending on the CPU, as few as 1 core can utilize all the memory bandwidth, meaning memory accesses could sometimes be modeled as completely serial.
julia> memory_bandwidth(verbose=true, multithreading=false)
╔══╡ Single-threaded:
╠══╡ (4 threads)
╟─ COPY: 144299.4 MB/s
╟─ SCALE: 144522.2 MB/s
╟─ ADD: 128922.2 MB/s
╟─ TRIAD: 128925.4 MB/s
╟─────────────────────
║ Median: 136612.4 MB/s
╚═════════════════════
(median = 136612.4, minimum = 128922.2, maximum = 144522.2)
julia> memory_bandwidth(verbose=true, multithreading=true)
╔══╡ Multi-threaded:
╠══╡ (4 threads)
╟─ COPY: 144299.4 MB/s
╟─ SCALE: 144522.2 MB/s
╟─ ADD: 127744.3 MB/s
╟─ TRIAD: 128530.3 MB/s
╟─────────────────────
║ Median: 136414.9 MB/s
╚═════════════════════
(median = 136414.9, minimum = 127744.3, maximum = 144522.2)
julia> versioninfo()
Julia Version 1.7.0-DEV.1088
Commit 6cebd28e66* (2021-05-11 14:04 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin20.3.0)
CPU: Apple M1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, cyclone)
Environment:
JULIA_NUM_THREADS = 4
The M1 Mac has 0 improvement from multithreading in this benchmark. But many other programs aren't constrained by memory bandwidth, and these will benefit from more cores.
So, I'd suggest picking a problem of interest and trying CheapThreads.@batch and/or Threads.@threads, and observing how they scale.