DistributedArrays.jl icon indicating copy to clipboard operation
DistributedArrays.jl copied to clipboard

DistributedArrays is not thread-safe

Open devmotion opened this issue 3 months ago • 0 comments

Due to the use of globals such as DistributedArrays.registry, currently DistributedArrays is not thread-safe:

julia> versioninfo()
Julia Version 1.11.6
Commit 9615af0f269 (2025-07-09 12:58 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 10 × Apple M2 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 6 default, 0 interactive, 3 GC (on 6 virtual cores)
Environment:
  JULIA_PKG_USE_CLI_GIT = true
  JULIA_PKG_PRESERVE_TIERED_INSTALLED = true

julia> using DistributedArrays

julia> v = [rand(10) for _ in 1:1000];

julia> @sync for vi in v
           Threads.@spawn distribute(vi)
       end;
ERROR: TaskFailedException

    nested task error: TaskFailedException

        nested task error: AssertionError: Multiple concurrent writes to Dict detected!
        Stacktrace:
          [1] _setindex!
            @ ./dict.jl:337 [inlined]
          [2] setindex!(h::Dict{Any, Nothing}, v0::Nothing, key::Tuple{Int64, Int64})
            @ Base ./dict.jl:363
          [3] push!
            @ ./set.jl:137 [inlined]
          [4] DArray{…}(id::Tuple{…}, dims::Tuple{…}, pids::Vector{…}, indices::Vector{…}, cuts::Vector{…}, lp::Vector{…})
            @ DistributedArrays ~/.julia/packages/DistributedArrays/SxLCk/src/darray.jl:47
          [5] construct_localparts(init::DistributedArrays.var"#73#75"{…}, id::Tuple{…}, dims::Tuple{…}, pids::Vector{…}, idxs::Vector{…}, cuts::Vector{…}; T::Nothing, A::Nothing)
            @ DistributedArrays ~/.julia/packages/DistributedArrays/SxLCk/src/darray.jl:126
          [6] construct_localparts(init::Function, id::Tuple{…}, dims::Tuple{…}, pids::Vector{…}, idxs::Vector{…}, cuts::Vector{…})
            @ DistributedArrays ~/.julia/packages/DistributedArrays/SxLCk/src/darray.jl:117
          [7] #invokelatest#2
            @ ./essentials.jl:1055 [inlined]
          [8] invokelatest
            @ ./essentials.jl:1052 [inlined]
          [9] #153
            @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/remotecall.jl:425 [inlined]
         [10] run_work_thunk(thunk::Distributed.var"#153#154"{…}, print_error::Bool)
            @ Distributed ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:70
         [11] #remotecall_fetch#158
            @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/remotecall.jl:450 [inlined]
         [12] remotecall_fetch
            @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/remotecall.jl:449 [inlined]
         [13] remotecall_fetch
            @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/remotecall.jl:492 [inlined]
         [14] (::DistributedArrays.var"#1#3"{…})()
            @ DistributedArrays ~/.julia/packages/DistributedArrays/SxLCk/src/darray.jl:88
        Stacktrace:
         [1] #remotecall_fetch#158
           @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/remotecall.jl:451 [inlined]
         [2] remotecall_fetch
           @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/remotecall.jl:449 [inlined]
         [3] remotecall_fetch
           @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/remotecall.jl:492 [inlined]
         [4] (::DistributedArrays.var"#1#3"{…})()
           @ DistributedArrays ~/.julia/packages/DistributedArrays/SxLCk/src/darray.jl:88
    Stacktrace:
     [1] sync_end(c::Channel{Any})
       @ Base ./task.jl:466
     [2] macro expansion
       @ ./task.jl:499 [inlined]
     [3] DArray(id::Tuple{…}, init::Function, dims::Tuple{…}, pids::Vector{…}, idxs::Vector{…}, cuts::Vector{…})
       @ DistributedArrays ~/.julia/packages/DistributedArrays/SxLCk/src/darray.jl:83
     [4] DArray(init::Function, dims::Tuple{Int64}, procs::Vector{Int64}, dist::Vector{Int64})
       @ DistributedArrays ~/.julia/packages/DistributedArrays/SxLCk/src/darray.jl:177
     [5] distribute(A::Vector{Float64}; procs::Vector{Int64}, dist::Vector{Int64})
       @ DistributedArrays ~/.julia/packages/DistributedArrays/SxLCk/src/darray.jl:550
     [6] distribute(A::Vector{Float64})
       @ DistributedArrays ~/.julia/packages/DistributedArrays/SxLCk/src/darray.jl:540
     [7] (::var"#7#8"{Vector{Float64}})()
       @ Main ./REPL[6]:2

...and 3 more exceptions.

Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:466
 [2] macro expansion
   @ task.jl:499 [inlined]
 [3] top-level scope
   @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.

AFAICT this could be fixed by using task-local states (OncePerTask on Julia >= 1.12).

devmotion avatar Sep 05 '25 07:09 devmotion