ScopedValue is allocating when accessed
I have looked into using scoped values for some temporary arrays to avoid allocations in parallel tasks. However, it seems scoped values are allocating when accessed, whereas with tls it can be avoided. This is unfortunate, since gc in parallel tasks can be a performance problem.
using .Threads
using BenchmarkTools
@noinline function tlsfun()
tlsvec = get!(() -> [0], task_local_storage(), :myvec)::Vector{Int}
tlsvec[1] += 1
return nothing
end
const dynvec = ScopedValue([0])
@noinline function dynfun()
dvec = dynvec[]
dvec[1] += 1
return nothing
end
function tlsrun()
@sync for _ in 1:nthreads()
@spawn for _ in 1:100000; tlsfun(); end
end
end
function dynrun()
@sync for _ in 1:nthreads()
@with dynvec=>[0] @spawn for _ in 1:100000; dynfun(); end
end
end
@btime tlsrun()
@btime dynrun()
versioninfo()
output:
2.326 ms (202 allocations: 21.03 KiB)
8.238 ms (2400274 allocations: 36.64 MiB)
Julia Version 1.12.0-DEV.121
Commit bc2212cc0e* (2024-03-04 01:20 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 24 × AMD Ryzen Threadripper PRO 5945WX 12-Cores
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 24 default, 0 interactive, 12 GC (on 24 virtual cores)
Environment:
JULIA_EDITOR = emacs -nw
This is probably caused by 5b2fcb68800 and is not multi-threading related:
julia> const dynvec = ScopedValue([0])
julia> @noinline function dynfun()
dvec = dynvec[]
dvec[1] += 1
return nothing
end
julia> foo() = @with dynvec=>[0] for _ in 1:1000_000; dynfun(); end
julia> @allocated foo()
16000208
The problem is this @noinline which forces a wrapping Tuple{Vector{Int}} to be allocated as a temporary, even though it is immediately unwrapped at every call-site:
https://github.com/JuliaLang/julia/blob/58291db09d18f59223edbdc15592ffcf0eb3dcfa/base/dict.jl#L1004
So is the problem there that the API wrongly returns an object of type (leaf.val,) instead of Some{V}(leaf.val)?
Would Some{V}(leaf.val) bypass the need to allocate the temporary here?
I guess not. Apparently we do not have calling convention support for Union{Struct, Ghost}, even though we very easily could (we have many variations on it already) and probably should (it is the iteration protocol)
Any chance of progress on this?
PR #55045 is merged now. The example by topolarity still allocates on 2264f502756.
Yes, this is only fixable by reverting 5b2fcb68800875e570d7bb8c78ed00d360b6cfd5, on top of #55045