ginkgo
ginkgo copied to clipboard
Thread-safety of Ginkgo
In the discussion on #993, but also some previous issues like #740 and #565, as well as future plans like #805, there is an underlying thing we have not yet talked about before, and that is what thread-safety guarantees different parts of Ginkgo can give.
I could imagine different levels of thread-safety guarantees we can give
- We don't give any guarantees, i.e. each thread needs to use its own executor and independent objects. This means we would only need to make some guarantees on things like cross-executor copies, but otherwise should have no issues. While easiest, I think this might be a bit overly restrictive?
- We promise that
Executor
itself is thread-safe (which it should be already currently), but require different threads to not use the same object at the same time, e.g. two threads can't callapply
on a solver at the same time. This would be relevant when we want to decide how to deal withDenseCache
types that avoid reallocations by storing a mutable Dense object internally. - Everything inside Ginkgo can be used in a thread-safe fashion, except for concurrent modification of data. This would mean that we need to make sure that a
DenseCache
provides a different thread-local object for each thread in something like amutable std::unordered_map<std::thread::id, std::unique_ptr<Dense<...>>>
. We would then need to make sure write accesses to the cache are mutually exclusive, but could otherwise provide pretty strong guarantees.
The whole area of temporary storage looks to me like the most important/difficult to handle part, but can you imagine any other areas that could have issues with thread-safety?
@tcojean How does StarPU manage different tasks? Does it spawn threads?
I feel like the internal complication arising from either options 2 or especially 3 needs good justification. In general, it may be reasonable to expect each host thread to have its own independent executor and objects, or otherwise leave it to the application to synchronize at a very coarse level.
There are long lived (p)threads on every core. Tasks are just structures which have a code tied to it and data. The tasks are moved to the cores/pthreads by the scheduler and its policy (work stealing or whatever else the user sets).
@tcojean I see. Is there a concept of ownership of data among threads that StarPU itself manages? Is that the case in general for tasking systems? If that is the case, perhaps there's no pressing need for us to manage the task-level coarse-grained parallelism. It's good you brought up StarPU, because I think this topic (of host thread safety) needs to be discussed in the context of any possible task runtime integration.
I'm not sure what you mean, but the data can move between cores/threads depending on the scheduler. Whether there will be concurrent access depends on the code itself as well:
- If we consider two tasks A and B, each using some Ginkgo functionality, but there is a data dependency between the two (maybe A writes some data and B reads that same data), then there will be a dependency of course between them added by StarPU and they will not run concurrently (no thread safety issue there from the Ginkgo side)
- If we still have two tasks, tasks A and B, which both call Ginkgo functionality, but for some reason they can run concurrently (maybe same solver but different b,x?) then there can be a thread safety issue if the scheduler decides to execute them at the same time on different cores.
In the second case you mentioned, perhaps cached data for the matrix, preconditioner etc. needs to be considered while setting task dependencies.
I guess that is the crux of 2 vs. 3: Does solver->apply(...) have a read/write dependency on itself? With 3. we could avoid that, with 2 we need to consider it.
Just for clarification, I think we already fulfill 2. right now, we just don't codify or check it in any way (manually or in unit tests)