TileDB icon indicating copy to clipboard operation
TileDB copied to clipboard

Lums/sc 19398/edge class

Open lums658 opened this issue 3 years ago • 1 comments

This PR adds an Edge class to the TileDB task graph library.

There are several substantial changes to the library as part of this PR (the Edge class itself is quite small in fact).

  • All supporting classes for the finite state machine have been moved to a new state_machine subdirectory. The code for the finite state machine remains in the file fsm.h.
  • The PortStateMachine class can now support two-stage and three-stage data transfer. The former is when a Source is directly connected to a Sink. The latter is when a Source is connected to a Sink via an Edge.
  • There are two enum classes representing the states for the two-stage and three-stage cases.
  • The PortStateMachine is parameterized by the type of the states (two_stage or three_stage) as well as the policy class that implements the actions for the state transitions. A policy class inherits from the PortStateMachine class using CRTP, making its implementation of the state transition actions directly available to the PortStateMachine class.
  • Policy classes are contained in the file policies.h. Policy classes are parameterized by a Mover and by a PortState (one of two_stage or three_stage). The Mover inherits from the policy class, again via CRTP. The Mover class provides the actual data movement actions for the policy class (the policy class implements the synchronization between threads running the Source and threads running the Sink.
  • There are a number of Policy classes implemented, but the primary ones of interest are the AsyncPolicy and the UnifiedAsyncPolicy. These policy classes implement the wait and notify functions using condition variables. Other policies are in place primarily for testing other parts of the task graph library.
  • Data movement from Source to Sink is managed by an ItemMover class, defined in item_mover.h. The ItemMover inherits from a policy class using CRTP.
  • The ItemMover class inherits from a specialized base class BaseMover, one specialized for two_stage on one for three_stage. The BaseMover maintains pointers to the data items from the Source, Sink, and in the case of three_stage data movement, the Edge. Data movement is effected by swapping the data being pointed to in order to move it along the pipeline.
  • Sources and Sinks inherit from a DataMover class and use its API for sending data from a Source to a Sink.
  • The Source and Sink are class templates the take an Mover as a parameter (actually as a template template, along with the type of data being moved. The Mover instantiated with the Source or the Sink is expected to take a single parameter, the type being moved.
  • Unit tests are included to exercise all of these different classes. Particular unit tests are included as well to test transferring DataBlocks.
  • An Edge inherits from both Source and Sink. The Edge constructor takes a Source and a Sink and connects its internal Sink to passed Source and its internal Source to the passed Sink. Edges are also parameterized by Mover type and the type being passed.
  • Although these classes are intended to work together, there are no include dependencies among the header files where they are declared.
  • The most important tests included in the various unit tests asynchronously sending a large number of numbers from a Source to a Sink and verifying that all numbers were sent correctly (as well as verifying that all intermediate states of the state machine are correct). This kind of test is repeated for the finite state machine, Source and Sink ports, and for pseudo graph nodes containing Sources and Sinks. The tests are conducted for directly-connected Sources and Sinks as well as for Sources and Sinks connected by Edges.
  • The test in ports/test/unit_concurrency.cpp has a very crude (very crude) diagnostic output showing overlap of operations for a two_stage data mover. A future PR will include this for the three_stage data mover as well.

Example:

The following is an example of instantiating a Source and a Sink. Note that in practice, many of these will be predefined so that users will not need to define the whole stack of types. Note that all of the classes in this stack are parameterized simply by the type being passed.


  // Define a two_stage item mover with asynchronous policy, parameterized by the type being used
  template <class T>
  using AsyncMover2 = ItemMover<AsyncPolicy, two_stage, T>;

  // Define an asynchronous policy based on the two stage item mover
  template <class T>
  using AsyncPolicy2 = AsyncPolicy<AsyncMover2<T>, two_stage>;

  // Define a state machine based on the asynchronous policy
  template <class T>
  using AsyncStateMachine2 = PortFiniteStateMachine<AsyncPolicy2<T>, two_stage>;

  // Create Source and Sink objects using the two stage asynchronous mover
  Source<AsyncMover2, size_t> left;
  Sink<AsyncMover2, size_t> right;

An Edge requires a three stage Mover and would be used as follows:

  Source<AsyncMover3, size_t> source;
  Sink<AsyncMover3, size_t> sink;
  Edge<AsyncMover3, size_t> edge(source, sink);

(Note that upcoming PRs will make more effective use of CTAD for Edge creation so that the type arguments for the Mover and the datatype being moved will not have to be specified.


TYPE: FEATURE DESC: Adds Edge class to the TileDB task graph library.

lums658 avatar Aug 09 '22 03:08 lums658