RFC: Implement the pass that unifies the encodings for each global.
UnifyEncodingForGlobals Pass - Implementation Plan
Overview
This pass unifies multiple encoded versions of the same immutable global to reduce memory footprint in IREE's Stream dialect. When a single source global (e.g., a model weight) is encoded multiple times with different encodings for different uses, this pass selects a unified encoding and updates all references.
Initial Prototype: https://github.com/hanhanW/iree/blob/hanhan-prototype-globals-with-multi-encoding-snapshot-20251022/compiler/src/iree/compiler/Dialect/Stream/Transforms/UnifyEncodingForGlobals.cpp
UnifyEncodingForGlobals Pass - Design Rationale
Below explains the design rationale for the UnifyEncodingForGlobals pass and why it meets IREE's bar as a retargetable compiler infrastructure.
1. Problem Statement
When the same source data (e.g., model weights) is used in multiple dispatch operations with different encoding requirements, IREE currently creates multiple encoded copies of the same data. For example, in LLaMA models:
// Same weight encoded twice with different iteration_sizes
%enc1 = stream.tensor.encode %weight -> tensor<4096x4096xf32, #encoding<iteration_sizes=[?, 4096, 4096]>>
%enc2 = stream.tensor.encode %weight -> tensor<4096x4096xf32, #encoding<iteration_sizes=[4, 4096, 4096]>>
This wastes memory because it may produce 2x encoded globals during execution, if they result in different layouts.
Goal: Unify multiple encodings of the same immutable source into a single encoding, reducing memory footprint.
2. Why Immutability Enables This Optimization
The foundation of this optimization rests on immutable globals:
-
Immutability Guarantee: IREE's
VerifyInitializationOrderPassensures immutable globals are initialized exactly once, only in initializers or initializer-only functions. -
Deterministic Data: Since immutable globals never change after initialization, any computation derived purely from immutable globals produces deterministic results.
-
Safe Unification: If two
stream.tensor.encodeops consume the same immutable source, we can safely use a unified encoding for both - the underlying data is guaranteed identical, which avoid memory footprint bloat.
This is why the pass strictly requires both source AND encoded globals to be immutable.
3. Hardware-Agnostic Encoding Selection via Resolver
3.1 The Resolver Architecture
IREE uses an encoding resolver infrastructure that is central to retargetability:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ This Pass │────▶│ Layout Resolver │────▶│ Backend- │
│ (identifies │ │ Interface │ │ Specific │
│ candidates) │ │ │ │ Implementation │
└─────────────────┘ └──────────────────┘ └─────────────────┘
- Each backend registers its own resolver via
AffinityAnalysisDialectInterface - Resolver knows device capabilities (GPU tile sizes, cache lines, vector widths)
- Resolver returns optimal encoding for the target device
- This pass queries the resolver, it doesn't hard-code encoding decisions
3.2 Why This Matters for Retargetability
The pass contains zero hardware-specific logic:
| What the pass does | What the pass does NOT do |
|---|---|
| Identifies unification candidates | Choose specific tile sizes |
| Queries resolver for unified encoding | Assume GPU vs CPU layout |
| Updates IR with resolver's answer | Hard-code any encoding format |
This means the same pass works for:
- CPU with AVX-512 (8-wide vectors)
- GPU with different warp/wavefront sizes
- Custom accelerators with unique memory layouts
- Future backends not yet implemented
3.3 Identity Encoding as Conservative Fallback
When no resolver is available or resolvers disagree:
auto unifiedEncoding = IREE::Encoding::IdentityAttr::get(context);
Why identity is always safe:
- Identity means "no specialized layout transformation"
- Every backend must handle identity encoding
- Later passes (SpecializeEncodings) can re-encode if beneficial -- not implemented yet and it can belong to other passes.
- Correctness is preserved; only optimization opportunity is deferred
4. Black-Box Principle and Subgraph Equivalence
4.1 The Black-Box Principle
All ops in the trace are treated as black boxes:
- Identity = (op_name, attributes, operand_subgraphs)
- Same identity → same output (referential transparency)
- No need to look inside function bodies, executables, or regions
This applies uniformly to:
util.call @func(...)- function is black box, arguments are inputsstream.tensor.dispatch @exec(...)- executable is black box, operands are inputstensor.extract_slice- op with attributes is black boxscf.if/scf.for- captured values are implicit inputs- Any other deterministic op
4.2 Subgraph Equivalence
Two stream.tensor.encode ops encode the same data if their source operand subgraphs are equivalent:
- Same leaf sources: Both trace to the same immutable global (by name) or constant (by value)
- Same intermediate ops: Same op types with same attributes along the path
- Same structure: The DAG of operations is isomorphic
Example of equivalent subgraphs (unifiable):
// Path 1
%g1 = util.global.load @weights // immutable
%s1 = tensor.extract_slice %g1[0,0][50,100][1,1]
%enc1 = stream.tensor.encode %s1 with encoding1
// Path 2 - equivalent subgraph, different encoding
%g2 = util.global.load @weights // same immutable global
%s2 = tensor.extract_slice %g2[0,0][50,100][1,1] // same slice params
%enc2 = stream.tensor.encode %s2 with encoding2
Both trace to the same immutable global through the same operations → valid for unification.
4.3 Control Flow as Black Boxes
For ops with regions (scf.if, scf.for), captured values act as implicit arguments:
%weights = util.global.load @weights
%cond = ...
%result = scf.if %cond -> tensor<...> {
scf.yield %weights // %weights captured from outside
} else {
scf.yield %weights
}
Canonical form:
scf.if(
cond: <cond_subgraph>,
captured_inputs: [<weights_subgraph>],
then_region: { yield input[0] },
else_region: { yield input[0] }
)
Two scf.if results are equivalent if:
- Same condition subgraph
- Same captured input subgraphs
- Same region structure (how inputs are used)
This unifies the model: functions/dispatches are black boxes with explicit arguments; control flow ops are black boxes with captured arguments plus region structure.
Why this works: In initializers, all inputs trace to immutable sources. Deterministic ops with same inputs produce same outputs.
4.4 Conservative Bail-Out
The analysis bails when equivalence cannot be proven:
| Condition | Rationale |
|---|---|
| Mutable global in path | Value may differ between uses |
| Unknown/unhandled op | Cannot prove semantic equivalence |
| Non-equivalent subgraphs | Different computation paths |
Principle: Better to miss an optimization than to introduce incorrectness.
5. Pass Ordering and Integration
5.1 Pipeline Position
CombineInitializersPass ← Merges initializer blocks
↓
UnifyEncodingForGlobalsPass ← THIS PASS
↓
... (cleanup pipeline) ← Later passes can deduplicate identical globals
// Reference: compiler/src/iree/compiler/Dialect/Stream/Transforms/Passes.cpp
passManager.addPass(IREE::Util::createCombineInitializersPass());
if (clUnifyEncodingForGlobals) {
passManager.addPass(IREE::Stream::createUnifyEncodingForGlobalsPass());
}
// ... followed by cleanup pipeline
Why this position:
- After
CombineInitializers: All initializer code is in one place, simplifies analysis - Before cleanup: Later passes can deduplicate globals that now have identical encodings
6. Complete IR Update Chain
When unifying encodings, ALL related ops must be updated consistently:
// BEFORE
%size = stream.tensor.sizeof tensor<4096x4096xf32, #old_encoding>
%enc = stream.tensor.encode %source -> tensor<4096x4096xf32, #old_encoding> in !stream.resource<*>{%size}
// AFTER
%size = stream.tensor.sizeof tensor<4096x4096xf32, #unified_encoding>
%enc = stream.tensor.encode %source -> tensor<4096x4096xf32, #unified_encoding> in !stream.resource<*>{%size}
Both stream.tensor.sizeof and stream.tensor.encode must use the same encoding, otherwise the size calculation would be wrong.
7. Summary: Why This Meets IREE's Bar
| Criterion | How Met |
|---|---|
| Correctness | Conservative bail-out on any ambiguity; identity encoding always safe |
| Retargetability | Zero hardware-specific logic; delegates to resolver infrastructure |
| Safety | Requires immutability for both source and encoded globals |
| Testability | Comprehensive positive AND negative test cases |
| Maintainability | Clean analysis/transform separation; follows existing patterns |
| Extensibility | Architecture ready for resolver integration and subgraph canonicalization |
The pass prioritizes correctness over optimization - it will never introduce incorrect behavior, even if it means missing some optimization opportunities in complex cases.
I'm brainstorming with claude and then I realized that DFX is not needed in this case. We only need simple traversal.
Collapsed into issue description above
Many thanks to @benvanik 's Rewriting CombineInitializersPass work. It makes many things clearer to me.
What does Mutable global in path mean here?
Does that only refer to the encoded globals and the source global?
I was wondering that in the context of subgraph equivalence: The RFC treats functions as black boxes, so two function calls of the same function with the same operands and attributes will be treated as equivalent subgraphs IIUC.
However, in the presence of any mutable global, which doesn't even have to be part of the chains between source and encoded globals, the same function with same operands and attributes could produce different outputs.
How would this be handled?
What does Mutable global in path mean here?
It refers to the source global. IMO, we should mark functions as immutable or mutable in the analysis. We bail out if we reach any mutable globals or functions.