DALI
DALI copied to clipboard
Executor 2.0: Stream assignment
Category:
New feature (non-breaking change which adds functionality)
Description:
Executor 2.0 doesn't have "stages" and streams are assigned according to stream assignment policy.
Stream assignment policies
Executor2
assigns streams to operator nodes based on the StreamPolicy
configuration parameter. There are three stream policies:
-
Single
- there's just one stream used by all operators which need one. -
PerBackend
- there's a distinct stream per backend (GPU, Mixed) -
PerOperator
- operators which can execute independently are assigned distinct streams - but operators with sequential dependency use the same stream
Per-operator stream assignment algorithm
PerOperator
stream assignment uses a minimum set of streams necessary to execute independent operators on different streams.
It works by traversing the graph with a priority queue. The element of the queue is a pair (node_idx, stream_idx), where node_idx is the index of a node in topological order and stream_idx is a tentative assignment of a stream index to the node. Each node can be placed in the queue many times, but it's processed only once. Because of the topological sort, the node is processed after all of its contributors have been processed - and it's seen with the lowest possible stream index. If a node is popped again, it's skipped. Once a node is processed, its id is added back to the set of free node ids and the consumers are pushed with the lowest free node ids - the first consumer will usually get the same stream id as the producer (or even a lower one).
Consider the following graph:
The letters correspond to an arbitrary topological order (any will do). The I/O direction is left to right
We start with the queue containing the root nodes and sequential stream indices:
(A, 0), (H, 1)
The set of free indices is (2, 3, 4, 5, 6, 7, 8, 9)
Then we have:
Step | Queue | Free indices | Final assignment |
---|---|---|---|
push roots | (A, 0), (H, 1) | 2, 3, 4, 5, 6, 7, 8, 9 | |
pop (A, 0) | (H, 1) | 0, 2, 3, 4, 5, 6, 7, 8, 9 | A0 |
push consumers of A | (B, 0), (C, 2), (E, 3), (H, 1) | 4, 5, 6, 7, 8, 9 | |
pop (B, 0) | (C, 2), (E, 3), (H, 1) | 0, 4, 5, 6, 7, 8, 9 | A0, B0 |
push consumers of B | (C, 2), (D, 0), (E, 3), (H, 1) | 4, 5, 6, 7, 8, 9 | |
pop (C, 2) | (D, 0), (E, 3), (H, 1) | 2, 4, 5, 6, 7, 8, 9 | A0, B0, C2 |
push consumers of C | (D, 0), (D, 2), (E, 3), (H, 1) | 4, 5, 6, 7, 8, 9 | |
pop (D, 0) | (D, 2), (E, 3), (H, 1) | 0, 4. 5, 6, 7, 8, 9 | A0, B0, C2, D0 |
push consumers of D | (D, 2), (E, 3), (F, 0), (H, 1) | 4, 5, 6, 7, 8, 9 | |
pop (D, 2) (already visited - skip) | (E, 3), (F, 0), (H, 1) | 2, 4, 5, 6, 7, 8, 9 | |
pop (E, 3) | (F, 0), (H, 1) | 2, 3, 4, 5, 6, 7, 8, 9 | A0, B0, C2, D0, E3 |
push consumers of E | (F, 0), (F, 3), (H, 1) | 2, 4, 5, 6, 7, 8, 9 | |
pop (F, 0) | (F, 3), (H, 1) | 0, 2, 4, 5, 6, 7, 8, 9 | A0, B0, C2, D0, E3, F0 |
push consumers of F | (F, 3), (G, 0), (H, 1), (I, 2) | 4, 5, 6, 7, 8, 9 | |
pop (F, 3) (skip) | (G, 0), (H, 1), (I, 2) | 3, 4, 5, 6, 7, 8, 9 | |
pop (G, 0) | (H, 1), (I, 2) | 0, 3, 4, 5, 6, 7, 8, 9 | A0, B0, C2, D0, E3, F0, G0 |
push consumers of G | (H, 1), (I, 0), (I, 2) | 3, 4, 5, 6, 7, 8, 9 | |
pop (H, 1) | (I, 0), (I, 2), (I, 3) | 1, 4, 5, 6, 7, 8, 9 | A0, B0, C2, D0, E3, F0, G0, H1 |
push consumers of H | (I, 0), (I, 1), (I, 2), (I, 3) | 4, 5, 6, 7, 8, 9 | |
pop (I, 0) | (I, 1), (I, 2), (I, 3) | 0, 4, 5, 6, 7, 8, 9 | A0, B0, C2, D0, E3, F0, G0, H1, I0 |
pop (I, 1) (skip) | (I, 2), (I, 3) | 0, 1, 4, 5, 6, 7, 8, 9 | |
pop (I, 2) (skip) | (I, 3) | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 | |
pop (I, 3) (skip) | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 |
The final stream assignment is:
A0, B0, C2, D0, E3, F0, G0, H1, I0
CPU operators that (even indirectly) contribute to GPU operators are treated as GPU operators and their stream assignment is nullified as a post-processing step.
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
- [ ] Existing tests apply
- [X] New tests added
- [ ] Python tests
- [X] GTests
- [ ] Benchmark
- [ ] Other
- [ ] N/A
Checklist
Documentation
- [ ] Existing documentation applies
- [X] Documentation updated
- [ ] Docstring
- [X] Doxygen
- [ ] RST
- [ ] Jupyter
- [ ] Other
- [ ] N/A
DALI team only
Requirements
- [ ] Implements new requirements
- [ ] Affects existing requirements
- [X] N/A
REQ IDs: N/A
JIRA TASK: DALI-4030