onnxruntime [DML EP] 2-Pass Graph Partitioning

Description: Today, DML EP is special from other EPs because:

During Partitioning, DML EP fuses it's subgraph on its own and does not use IExecutionProvider::Compile() method, unlike other EPs. This is so DML EP can steal the initializers during IExecutionProvider::GetCapability() call to avoid memory regression.
Because DML EP fuses it's subgraph during Partitioning, it can't leverage ORT L2/L3 optimizers because these optimizers apply after Partitioning call (previous step)
DML EP also has its own specific optimizer.

DML EP needs to be re-architected to utilize IExecutionProvider::Compile() and ORT L2/L3 transformers. It will be done in 2 phases:

Utilize ORT L1-L3 transformers:
- It will be solved by calling graph partitioning 2 times for DML EP, one call before applying ORT L2/L3 transformer and one after applying ORT L2/L3 transformer. During this change, DML specific optimizers will be moved under L3 transformer (note: it will add a dependency to DML EP codebase for graph_transformer_utils.cc).
- During the 1st call of partitioning, DML EP will iterate on each node and check whether it can be executed via DML or not. It will return these nodes as an individual node wrapped as IndexedSubgraph. Here it won't create any subgraph.
- During the 2nd call of partitioning, DML EP will perform actual partition (subgraph creation and DML fusion). Here it won't check whether each node can be run via DML or not.
- Condition to decide between 1st and 2nd call of partitioning is to check whether any node has been assigned to DML EP or not. This condition is robust because ORT assign nodes to an EP in the priority order provided by user and does not override the EP assignment.
- Time complexity of session creation will remain roughly the same because both partitioning does not overlap with each other.

Utilize IExecutionProvider::Compile method: It requires a solution from ORT team to have a solution for memory regression problem. [TODO]

This PR is the prototype for Phase 1. I have verified the basic logic by enabling Gelu L2 transformer for DML EP.

Jul 15 '22 03:07 sumitsays

This pull request introduces 12 alerts and fixes 10 when merging d91066f4b9b88573d8edd533dc30279a3cfc550a into 3bb065a2b7e574ba3869f883c18d05293a58535f - view on LGTM.com

new alerts:

5 for Statement has no effect
3 for Unused import
2 for Commented-out code
1 for Multiplication result converted to larger type
1 for Unused static variable

fixed alerts:

7 for Unused import
1 for 'import *' may pollute namespace
1 for Duplicate key in dict literal
1 for Implicit string concatenation in a list

Jul 23 '22 06:07 lgtm-com[bot]

This was the Prototype PR to architecture DML EP to enable ORT L2/L3 transformers. We ended up choosing another approach which is more robust. https://github.com/microsoft/onnxruntime/pull/13131

Sep 28 '22 03:09 sumitsays