JuLIP.jl
JuLIP.jl copied to clipboard
Multithreading bug
I observe this in ACEatoms on the co/compat branch with the following setup:
(ACEatoms) pkg> status
Project ACEatoms v0.0.12
Status `~/gits/ACEatoms.jl/Project.toml`
[3e8ccfd2] ACE v0.12.37
[14bae519] ACEbase v0.2.4 `~/gits/ACEbase.jl`
[945c410c] JuLIP v0.13.7
⌅ [d9ec5142] NamedTupleTools v0.13.7
[189a3867] Reexport v1.2.2
[276daf66] SpecialFunctions v2.1.4
[90137ffa] StaticArrays v1.4.4
[e88e6eb3] Zygote v0.6.38
[37e2e46d] LinearAlgebra
[9a3f8284] Random
With multi-threading on, some tests fail. Possibly this is is multi-threading bug in Chached arrays and not in JuLIP though.
initial tests don't show any problems in the ACE code. Also - virials evaluate ok, which suggests it could after all be in JuLIP. Very odd...
maybe this is now fixed in #158 - but leaving open until can test this thoroughly
@tjjarvinen -- do you agree that this was likely fixed by switching to a static scheduleer?
Yes this is most likely correct.
The bug was a data race that was caused by task migration, if said in "proper" computer science terms.
Static scheduler bocks task migration, so no data race happened.
Julia documentation sums up this
help?> ?Threads.@threads
Threads.@threads [schedule] for ... end
A macro to execute a for loop in parallel. The iteration space is
distributed to coarse-grained tasks. This policy can be specified by the
schedule argument. The execution of the loop waits for the evaluation of all
iterations.
See also: @spawn and pmap in Distributed.
Extended help
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
Semantics
===========
Unless stronger guarantees are specified by the scheduling option, the loop
executed by @threads macro have the following semantics.
The @threads macro executes the loop body in an unspecified order and
potentially concurrently. It does not specify the exact assignments of the
tasks and the worker threads. The assignments can be different for each
execution. The loop body code (including any code transitively called from
it) must not make any assumptions about the distribution of iterations to
tasks or the worker thread in which they are executed. The loop body for
each iteration must be able to make forward progress independent of other
iterations and be free from data races. As such, invalid synchronizations
across iterations may deadlock while unsynchronized memory accesses may
result in undefined behavior.
For example, the above conditions imply that:
• The lock taken in an iteration must be released within the same
iteration.
• Communicating between iterations using blocking primitives like
Channels is incorrect.
• Write only to locations not shared across iterations (unless a
lock or atomic operation is used).
• The value of threadid() may change even within a single iteration.
Schedulers
============
Without the scheduler argument, the exact scheduling is unspecified and
varies across Julia releases. Currently, :dynamic is used when the scheduler
is not specified.
│ Julia 1.5
│
│ The schedule argument is available as of Julia 1.5.
:dynamic (default)
––––––––––––––––––––
:dynamic scheduler executes iterations dynamically to available worker
threads. Current implementation assumes that the workload for each iteration
is uniform. However, this assumption may be removed in the future.
This scheduling option is merely a hint to the underlying execution
mechanism. However, a few properties can be expected. The number of Tasks
used by :dynamic scheduler is bounded by a small constant multiple of the
number of available worker threads (nthreads()). Each task processes
contiguous regions of the iteration space. Thus, @threads :dynamic for x in
xs; f(x); end is typically more efficient than @sync for x in xs; @spawn
f(x); end if length(xs) is significantly larger than the number of the
worker threads and the run-time of f(x) is relatively smaller than the cost
of spawning and synchronizaing a task (typically less than 10 microseconds).
│ Julia 1.8
│
│ The :dynamic option for the schedule argument is available and the
│ default as of Julia 1.8.
:static
–––––––––
:static scheduler creates one task per thread and divides the iterations
equally among them, assigning each task specifically to each thread. In
particular, the value of threadid() is guranteed to be constant within one
iteration. Specifying :static is an error if used from inside another
@threads loop or from a thread other than 1.
│ Note
│
│ :static scheduling exists for supporting transition of code
│ written before Julia 1.3. In newly written library functions,
│ :static scheduling is discouraged because the functions using this
│ option cannot be called from arbitrary worker threads.
Example
=========
To illustrate of the different scheduling strategies, consider the following
function busywait containing a non-yielding timed loop that runs for a given
number of seconds.
julia> function busywait(seconds)
tstart = time_ns()
while (time_ns() - tstart) / 1e9 < seconds
end
end
julia> @time begin
Threads.@spawn busywait(5)
Threads.@threads :static for i in 1:Threads.nthreads()
busywait(1)
end
end
6.003001 seconds (16.33 k allocations: 899.255 KiB, 0.25% compilation time)
julia> @time begin
Threads.@spawn busywait(5)
Threads.@threads :dynamic for i in 1:Threads.nthreads()
busywait(1)
end
end
2.012056 seconds (16.05 k allocations: 883.919 KiB, 0.66% compilation time)
The :dynamic example takes 2 seconds since one of the non-occupied threads
is able to run two of the 1-second iterations to complete the for loop.
Here are couple of notes from it
- The value of threadid() may change even within a single iteration.
- The number of Tasks used by :dynamic scheduler is bounded by a small constant multiple of the number of available worker threads (nthreads()).
- :static scheduler creates one task per thread and divides the iterations equally among them, assigning each task specifically to each thread. In particular, the value of threadid() is guranteed to be constant within one iteration.
So, if you use dynamic scheduler, you will need more temporary arrays than the number of threads. This can be an issue for memory handling.
Based on this, sticking with static is the best option for now!
thank you