DALI icon indicating copy to clipboard operation
DALI copied to clipboard

Executor 2.0: Stream assignment

Open mzient opened this issue 6 months ago • 0 comments

Category:

New feature (non-breaking change which adds functionality)

Description:

Executor 2.0 doesn't have "stages" and streams are assigned according to stream assignment policy.

Stream assignment policies

Executor2 assigns streams to operator nodes based on the StreamPolicy configuration parameter. There are three stream policies:

  1. Single - there's just one stream used by all operators which need one.
  2. PerBackend - there's a distinct stream per backend (GPU, Mixed)
  3. PerOperator - operators which can execute independently are assigned distinct streams - but operators with sequential dependency use the same stream

Per-operator stream assignment algorithm

PerOperator stream assignment uses a minimum set of streams necessary to execute independent operators on different streams.
It works by traversing the graph with a priority queue. The element of the queue is a pair (node_idx, stream_idx), where node_idx is the index of a node in topological order and stream_idx is a tentative assignment of a stream index to the node. Each node can be placed in the queue many times, but it's processed only once. Because of the topological sort, the node is processed after all of its contributors have been processed - and it's seen with the lowest possible stream index. If a node is popped again, it's skipped. Once a node is processed, its id is added back to the set of free node ids and the consumers are pushed with the lowest free node ids - the first consumer will usually get the same stream id as the producer (or even a lower one).

Consider the following graph:
image

The letters correspond to an arbitrary topological order (any will do). The I/O direction is left to right
We start with the queue containing the root nodes and sequential stream indices:
(A, 0), (H, 1)
The set of free indices is (2, 3, 4, 5, 6, 7, 8, 9)
Then we have:

Step Queue Free indices Final assignment
push roots (A, 0), (H, 1) 2, 3, 4, 5, 6, 7, 8, 9
pop (A, 0) (H, 1) 0, 2, 3, 4, 5, 6, 7, 8, 9 A0
push consumers of A (B, 0), (C, 2), (E, 3), (H, 1) 4, 5, 6, 7, 8, 9
pop (B, 0) (C, 2), (E, 3), (H, 1) 0, 4, 5, 6, 7, 8, 9 A0, B0
push consumers of B (C, 2), (D, 0), (E, 3), (H, 1) 4, 5, 6, 7, 8, 9
pop (C, 2) (D, 0), (E, 3), (H, 1) 2, 4, 5, 6, 7, 8, 9 A0, B0, C2
push consumers of C (D, 0), (D, 2), (E, 3), (H, 1) 4, 5, 6, 7, 8, 9
pop (D, 0) (D, 2), (E, 3), (H, 1) 0, 4. 5, 6, 7, 8, 9 A0, B0, C2, D0
push consumers of D (D, 2), (E, 3), (F, 0), (H, 1) 4, 5, 6, 7, 8, 9
pop (D, 2) (already visited - skip) (E, 3), (F, 0), (H, 1) 2, 4, 5, 6, 7, 8, 9
pop (E, 3) (F, 0), (H, 1) 2, 3, 4, 5, 6, 7, 8, 9 A0, B0, C2, D0, E3
push consumers of E (F, 0), (F, 3), (H, 1) 2, 4, 5, 6, 7, 8, 9
pop (F, 0) (F, 3), (H, 1) 0, 2, 4, 5, 6, 7, 8, 9 A0, B0, C2, D0, E3, F0
push consumers of F (F, 3), (G, 0), (H, 1), (I, 2) 4, 5, 6, 7, 8, 9
pop (F, 3) (skip) (G, 0), (H, 1), (I, 2) 3, 4, 5, 6, 7, 8, 9
pop (G, 0) (H, 1), (I, 2) 0, 3, 4, 5, 6, 7, 8, 9 A0, B0, C2, D0, E3, F0, G0
push consumers of G (H, 1), (I, 0), (I, 2) 3, 4, 5, 6, 7, 8, 9
pop (H, 1) (I, 0), (I, 2), (I, 3) 1, 4, 5, 6, 7, 8, 9 A0, B0, C2, D0, E3, F0, G0, H1
push consumers of H (I, 0), (I, 1), (I, 2), (I, 3) 4, 5, 6, 7, 8, 9
pop (I, 0) (I, 1), (I, 2), (I, 3) 0, 4, 5, 6, 7, 8, 9 A0, B0, C2, D0, E3, F0, G0, H1, I0
pop (I, 1) (skip) (I, 2), (I, 3) 0, 1, 4, 5, 6, 7, 8, 9
pop (I, 2) (skip) (I, 3) 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
pop (I, 3) (skip) 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

The final stream assignment is:
A0, B0, C2, D0, E3, F0, G0, H1, I0

CPU operators that (even indirectly) contribute to GPU operators are treated as GPU operators and their stream assignment is nullified as a post-processing step.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • [ ] Existing tests apply
  • [X] New tests added
    • [ ] Python tests
    • [X] GTests
    • [ ] Benchmark
    • [ ] Other
  • [ ] N/A

Checklist

Documentation

  • [ ] Existing documentation applies
  • [X] Documentation updated
    • [ ] Docstring
    • [X] Doxygen
    • [ ] RST
    • [ ] Jupyter
    • [ ] Other
  • [ ] N/A

DALI team only

Requirements

  • [ ] Implements new requirements
  • [ ] Affects existing requirements
  • [X] N/A

REQ IDs: N/A

JIRA TASK: DALI-4030

mzient avatar Aug 08 '24 17:08 mzient