JuLIP.jl icon indicating copy to clipboard operation
JuLIP.jl copied to clipboard

Multithreading bug

Open cortner opened this issue 3 years ago • 1 comments

I observe this in ACEatoms on the co/compat branch with the following setup:

(ACEatoms) pkg> status
Project ACEatoms v0.0.12
Status `~/gits/ACEatoms.jl/Project.toml`
  [3e8ccfd2] ACE v0.12.37
  [14bae519] ACEbase v0.2.4 `~/gits/ACEbase.jl`
  [945c410c] JuLIP v0.13.7
⌅ [d9ec5142] NamedTupleTools v0.13.7
  [189a3867] Reexport v1.2.2
  [276daf66] SpecialFunctions v2.1.4
  [90137ffa] StaticArrays v1.4.4
  [e88e6eb3] Zygote v0.6.38
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random

With multi-threading on, some tests fail. Possibly this is is multi-threading bug in Chached arrays and not in JuLIP though.

cortner avatar Apr 26 '22 23:04 cortner

initial tests don't show any problems in the ACE code. Also - virials evaluate ok, which suggests it could after all be in JuLIP. Very odd...

cortner avatar Apr 27 '22 04:04 cortner

maybe this is now fixed in #158 - but leaving open until can test this thoroughly

cortner avatar Dec 14 '22 09:12 cortner

@tjjarvinen -- do you agree that this was likely fixed by switching to a static scheduleer?

cortner avatar Feb 15 '23 20:02 cortner

Yes this is most likely correct.

The bug was a data race that was caused by task migration, if said in "proper" computer science terms.

Static scheduler bocks task migration, so no data race happened.

Julia documentation sums up this

help?> ?Threads.@threads
  Threads.@threads [schedule] for ... end

  A macro to execute a for loop in parallel. The iteration space is
  distributed to coarse-grained tasks. This policy can be specified by the
  schedule argument. The execution of the loop waits for the evaluation of all
  iterations.

  See also: @spawn and pmap in Distributed.

  Extended help
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  Semantics
  ===========

  Unless stronger guarantees are specified by the scheduling option, the loop
  executed by @threads macro have the following semantics.

  The @threads macro executes the loop body in an unspecified order and
  potentially concurrently. It does not specify the exact assignments of the
  tasks and the worker threads. The assignments can be different for each
  execution. The loop body code (including any code transitively called from
  it) must not make any assumptions about the distribution of iterations to
  tasks or the worker thread in which they are executed. The loop body for
  each iteration must be able to make forward progress independent of other
  iterations and be free from data races. As such, invalid synchronizations
  across iterations may deadlock while unsynchronized memory accesses may
  result in undefined behavior.

  For example, the above conditions imply that:

    •  The lock taken in an iteration must be released within the same
       iteration.

    •  Communicating between iterations using blocking primitives like
       Channels is incorrect.

    •  Write only to locations not shared across iterations (unless a
       lock or atomic operation is used).

    •  The value of threadid() may change even within a single iteration.

  Schedulers
  ============

  Without the scheduler argument, the exact scheduling is unspecified and
  varies across Julia releases. Currently, :dynamic is used when the scheduler
  is not specified.

  │ Julia 1.5
  │
  │  The schedule argument is available as of Julia 1.5.

  :dynamic (default)
  ––––––––––––––––––––

  :dynamic scheduler executes iterations dynamically to available worker
  threads. Current implementation assumes that the workload for each iteration
  is uniform. However, this assumption may be removed in the future.

  This scheduling option is merely a hint to the underlying execution
  mechanism. However, a few properties can be expected. The number of Tasks
  used by :dynamic scheduler is bounded by a small constant multiple of the
  number of available worker threads (nthreads()). Each task processes
  contiguous regions of the iteration space. Thus, @threads :dynamic for x in
  xs; f(x); end is typically more efficient than @sync for x in xs; @spawn
  f(x); end if length(xs) is significantly larger than the number of the
  worker threads and the run-time of f(x) is relatively smaller than the cost
  of spawning and synchronizaing a task (typically less than 10 microseconds).

  │ Julia 1.8
  │
  │  The :dynamic option for the schedule argument is available and the
  │  default as of Julia 1.8.

  :static
  –––––––––

  :static scheduler creates one task per thread and divides the iterations
  equally among them, assigning each task specifically to each thread. In
  particular, the value of threadid() is guranteed to be constant within one
  iteration. Specifying :static is an error if used from inside another
  @threads loop or from a thread other than 1.

  │ Note
  │
  │  :static scheduling exists for supporting transition of code
  │  written before Julia 1.3. In newly written library functions,
  │  :static scheduling is discouraged because the functions using this
  │  option cannot be called from arbitrary worker threads.

  Example
  =========

  To illustrate of the different scheduling strategies, consider the following
  function busywait containing a non-yielding timed loop that runs for a given
  number of seconds.

  julia> function busywait(seconds)
              tstart = time_ns()
              while (time_ns() - tstart) / 1e9 < seconds
              end
          end
  
  julia> @time begin
              Threads.@spawn busywait(5)
              Threads.@threads :static for i in 1:Threads.nthreads()
                  busywait(1)
              end
          end
  6.003001 seconds (16.33 k allocations: 899.255 KiB, 0.25% compilation time)
  
  julia> @time begin
              Threads.@spawn busywait(5)
              Threads.@threads :dynamic for i in 1:Threads.nthreads()
                  busywait(1)
              end
          end
  2.012056 seconds (16.05 k allocations: 883.919 KiB, 0.66% compilation time)

  The :dynamic example takes 2 seconds since one of the non-occupied threads
  is able to run two of the 1-second iterations to complete the for loop.

Here are couple of notes from it

  • The value of threadid() may change even within a single iteration.
  • The number of Tasks used by :dynamic scheduler is bounded by a small constant multiple of the number of available worker threads (nthreads()).
  • :static scheduler creates one task per thread and divides the iterations equally among them, assigning each task specifically to each thread. In particular, the value of threadid() is guranteed to be constant within one iteration.

So, if you use dynamic scheduler, you will need more temporary arrays than the number of threads. This can be an issue for memory handling.

Based on this, sticking with static is the best option for now!

tjjarvinen avatar Feb 16 '23 12:02 tjjarvinen

thank you

cortner avatar Feb 16 '23 16:02 cortner