celerity-runtime Introduce less restrictive side effect orders

Side-effects introduced in #68 currently always serialize execution between host tasks affecting the same host object. Since the precise interactions within the host task are entirely user-defined, this can be overly restrictive. For example, incrementing an atomic counter from multiple host tasks does not need to introduce any scheduling constraints, but the user should still be able to rely on Celerity guaranteeing the correct lifetime of the accessed host object.

Choosing between different scheduling guarantees for side effects is reminiscent of access modes on buffer access. However, the read/write dichotomy itself is not a good fit for this new use case: First of all, whether two "writing" side effects can be scheduled concurrently or not depends on the level of synchronization employed by the object itself, which is outside of Celerity's control. Also, for buffers, the access modes request implicit data movement from the runtime, which does not apply to host objects either.

This PR proposes three distinct side effect orders:

side_effect_order::sequential: The task cannot be re-ordered or executed concurrently with any other task affecting the same host object (the current behavior, and also the default)
side_effect_order::exclusive: The task may be re-ordered, but not executed concurrently with any other task affecting the same host object. Example usage: Inserting into a (unsynchronized) host_object<std::unordered_set<T>>.
side_effect_order::relaxed: The task may be executed concurrently with and freely re-ordered against other tasks affecting the same host object. Example usage: Incrementing an atomic counter.

Note that between a pair of tasks affecting the same host object, the more restrictive side effect order decides the level of freedom with respect to re-ordering and concurrency: relaxed effects can be executed concurrently only with other relaxed effects, but re-ordered against relaxed and exclusive. A sequential effect will always serialize execution around the host object.

To implement re-ordering constraints, task and command graphs gain undirected edges (called conflicts) in addition to directed edges (called dependencies). These indicate that one node must be sequenced before the other, but the order is arbitrary. This is also backported to collective host tasks (which require an arbitrary, but fixed execution order on all nodes), giving the graph generator more freedom in scheduling.

task graph with conflict edges

These conflict edges are communicated to the executor of each node, which opportunistically schedules non-conflicting sets of commands concurrently. The assumption here is that executors will have a significant backlog most of the time, allowing them to make informed scheduling decisions at that level.