RecurrenceAnalysis.jl icon indicating copy to clipboard operation
RecurrenceAnalysis.jl copied to clipboard

Likely erroneous use of `Threads.nthreads` and `Threads.threadid`

Open KristofferC opened this issue 10 months ago • 3 comments

Hello,

From the following PkgEval log: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/5da257d_vs_d63aded/RecurrenceAnalysis.primary.log it seems likely that this package is using Threads.nthreads and Threads.threadid incorrectly as described in https://julialang.org/blog/2023/07/PSA-dont-use-threadid/ and https://juliafolds2.github.io/OhMyThreads.jl/stable/literate/tls/tls/#The-naive-(and-incorrect)-approach.

In the upcoming Julia 1.12, julia runs with an interactive thread by default which means that the number of worker threads nthreads() defaults to 1 but the thread id that things will run on in a threaded context will typically be 2 (the interactive thread having the id 1):

julia> Threads.nthreads()
1

julia> Threads.@threads for i = 1:3
           @show Threads.threadid()
       end
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2

Trying to index a vector of length Threads.nthreads() with Threads.threadid() will thus error.

Note that even though this code only started directly erroring in 1.12 the code was likely already incorrect on earlier Julia versions unless it made sure that the spawned tasks did not migrate between threads (which can happen at any yield point).

Some remedies to this are:

  • Use the new OncePerThread (or OncePerTask) functionality added in 1.12.
  • Use task local values instead of thread local values as described in https://juliafolds2.github.io/OhMyThreads.jl/stable/literate/tls/tls/#TLV
  • Use a Channel as described in https://juliafolds2.github.io/OhMyThreads.jl/stable/literate/tls/tls/#The-safe-way:-Channel.
  • Use Threads.maxthreadid() instead of Threads.nthreads(). This can waste some memory since maxthreadid() can give a higher number than the number of worker threads. It also doesn't help against the task migration issue described earlier (but the previous code did not do that either).

KristofferC avatar Feb 24 '25 14:02 KristofferC

After looking into the blog post of OhMyThreads, it seems to me that the best approach is using Channels. That doesn't depend on backwards-incompatible features of 1.12, and task local values are not adequate for the type of tasks that are parallelized in this package (the values that are computed in the parallelized loop are needed after the loops end).

@asinghvi17 : since you designed the parallelized code in the first place, do you want to try a fix on this? Otherwise I might do an attempt, based on the blog's examples.

heliosdrm avatar Feb 25 '25 14:02 heliosdrm

Thanks for the ping - can try a fix this week or weekend. Yeah I probably wrote that before the advice was out :D

asinghvi17 avatar Feb 25 '25 14:02 asinghvi17

I have to fix this same issue in 5 more repositories of JuliaDynamics... So if someone can work on a fix here I would appreciate it, I would save me a lot of work by copying your effort!

Datseris avatar Apr 01 '25 10:04 Datseris

If @asinghvi17 cannot dedicate time to this, I could start an attempt this weekend.

heliosdrm avatar Apr 07 '25 09:04 heliosdrm

I'm working on a brief patch now using a task based approach

asinghvi17 avatar Apr 10 '25 17:04 asinghvi17