ThreadsX.jl
ThreadsX.jl copied to clipboard
Performance of map
@tkf what is the reason of the following performance comparison? (this is a fresh Julia session on Win11 with 8 threads):
julia> Threads.nthreads()
8
julia> x = rand(10^8) .- 0.5;
julia> @time map(abs, x);
0.251657 seconds (99.43 k allocations: 768.457 MiB, 18.93% gc time, 14.49% compilation time)
julia> @time map(abs, x);
0.239117 seconds (3 allocations: 762.940 MiB, 22.72% gc time)
julia> using ThreadsX
julia> @time ThreadsX.map(abs, x);
1.842571 seconds (2.53 M allocations: 4.024 GiB, 10.90% gc time, 33.76% compilation time)
julia> @time ThreadsX.map(abs, x);
1.176356 seconds (1.51 k allocations: 3.888 GiB, 14.73% gc time)
If I use 1, 2, or 4 threads the situation is similar.
Thank you!
Hoping to mitigate this somewhat with https://github.com/JuliaFolds/Transducers.jl/pull/553. However, Transducers.jl (and by extension, ThreadsX.jl), is at it's worst when dealing with very fast functions like abs.
The easiest fix for you to get better performance would be to use ThreadsX.map!(abs, similar(x), x).