iree icon indicating copy to clipboard operation
iree copied to clipboard

RFC: Implement the pass that unifies the encodings for each global.

Open hanhanW opened this issue 1 month ago • 4 comments

UnifyEncodingForGlobals Pass - Implementation Plan

Overview

This pass unifies multiple encoded versions of the same immutable global to reduce memory footprint in IREE's Stream dialect. When a single source global (e.g., a model weight) is encoded multiple times with different encodings for different uses, this pass selects a unified encoding and updates all references.

Initial Prototype: https://github.com/hanhanW/iree/blob/hanhan-prototype-globals-with-multi-encoding-snapshot-20251022/compiler/src/iree/compiler/Dialect/Stream/Transforms/UnifyEncodingForGlobals.cpp

UnifyEncodingForGlobals Pass - Design Rationale

Below explains the design rationale for the UnifyEncodingForGlobals pass and why it meets IREE's bar as a retargetable compiler infrastructure.


1. Problem Statement

When the same source data (e.g., model weights) is used in multiple dispatch operations with different encoding requirements, IREE currently creates multiple encoded copies of the same data. For example, in LLaMA models:

// Same weight encoded twice with different iteration_sizes
%enc1 = stream.tensor.encode %weight -> tensor<4096x4096xf32, #encoding<iteration_sizes=[?, 4096, 4096]>>
%enc2 = stream.tensor.encode %weight -> tensor<4096x4096xf32, #encoding<iteration_sizes=[4, 4096, 4096]>>

This wastes memory because it may produce 2x encoded globals during execution, if they result in different layouts.

Goal: Unify multiple encodings of the same immutable source into a single encoding, reducing memory footprint.


2. Why Immutability Enables This Optimization

The foundation of this optimization rests on immutable globals:

  1. Immutability Guarantee: IREE's VerifyInitializationOrderPass ensures immutable globals are initialized exactly once, only in initializers or initializer-only functions.

  2. Deterministic Data: Since immutable globals never change after initialization, any computation derived purely from immutable globals produces deterministic results.

  3. Safe Unification: If two stream.tensor.encode ops consume the same immutable source, we can safely use a unified encoding for both - the underlying data is guaranteed identical, which avoid memory footprint bloat.

This is why the pass strictly requires both source AND encoded globals to be immutable.


3. Hardware-Agnostic Encoding Selection via Resolver

3.1 The Resolver Architecture

IREE uses an encoding resolver infrastructure that is central to retargetability:

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  This Pass      │────▶│  Layout Resolver │────▶│  Backend-       │
│  (identifies    │     │  Interface       │     │  Specific       │
│   candidates)   │     │                  │     │  Implementation │
└─────────────────┘     └──────────────────┘     └─────────────────┘
  • Each backend registers its own resolver via AffinityAnalysisDialectInterface
  • Resolver knows device capabilities (GPU tile sizes, cache lines, vector widths)
  • Resolver returns optimal encoding for the target device
  • This pass queries the resolver, it doesn't hard-code encoding decisions

3.2 Why This Matters for Retargetability

The pass contains zero hardware-specific logic:

What the pass does What the pass does NOT do
Identifies unification candidates Choose specific tile sizes
Queries resolver for unified encoding Assume GPU vs CPU layout
Updates IR with resolver's answer Hard-code any encoding format

This means the same pass works for:

  • CPU with AVX-512 (8-wide vectors)
  • GPU with different warp/wavefront sizes
  • Custom accelerators with unique memory layouts
  • Future backends not yet implemented

3.3 Identity Encoding as Conservative Fallback

When no resolver is available or resolvers disagree:

auto unifiedEncoding = IREE::Encoding::IdentityAttr::get(context);

Why identity is always safe:

  • Identity means "no specialized layout transformation"
  • Every backend must handle identity encoding
  • Later passes (SpecializeEncodings) can re-encode if beneficial -- not implemented yet and it can belong to other passes.
  • Correctness is preserved; only optimization opportunity is deferred

4. Black-Box Principle and Subgraph Equivalence

4.1 The Black-Box Principle

All ops in the trace are treated as black boxes:

  • Identity = (op_name, attributes, operand_subgraphs)
  • Same identity → same output (referential transparency)
  • No need to look inside function bodies, executables, or regions

This applies uniformly to:

  • util.call @func(...) - function is black box, arguments are inputs
  • stream.tensor.dispatch @exec(...) - executable is black box, operands are inputs
  • tensor.extract_slice - op with attributes is black box
  • scf.if / scf.for - captured values are implicit inputs
  • Any other deterministic op

4.2 Subgraph Equivalence

Two stream.tensor.encode ops encode the same data if their source operand subgraphs are equivalent:

  • Same leaf sources: Both trace to the same immutable global (by name) or constant (by value)
  • Same intermediate ops: Same op types with same attributes along the path
  • Same structure: The DAG of operations is isomorphic

Example of equivalent subgraphs (unifiable):

// Path 1
%g1 = util.global.load @weights  // immutable
%s1 = tensor.extract_slice %g1[0,0][50,100][1,1]
%enc1 = stream.tensor.encode %s1 with encoding1

// Path 2 - equivalent subgraph, different encoding
%g2 = util.global.load @weights  // same immutable global
%s2 = tensor.extract_slice %g2[0,0][50,100][1,1]  // same slice params
%enc2 = stream.tensor.encode %s2 with encoding2

Both trace to the same immutable global through the same operations → valid for unification.

4.3 Control Flow as Black Boxes

For ops with regions (scf.if, scf.for), captured values act as implicit arguments:

%weights = util.global.load @weights
%cond = ...
%result = scf.if %cond -> tensor<...> {
  scf.yield %weights  // %weights captured from outside
} else {
  scf.yield %weights
}

Canonical form:

scf.if(
  cond: <cond_subgraph>,
  captured_inputs: [<weights_subgraph>],
  then_region: { yield input[0] },
  else_region: { yield input[0] }
)

Two scf.if results are equivalent if:

  1. Same condition subgraph
  2. Same captured input subgraphs
  3. Same region structure (how inputs are used)

This unifies the model: functions/dispatches are black boxes with explicit arguments; control flow ops are black boxes with captured arguments plus region structure.

Why this works: In initializers, all inputs trace to immutable sources. Deterministic ops with same inputs produce same outputs.

4.4 Conservative Bail-Out

The analysis bails when equivalence cannot be proven:

Condition Rationale
Mutable global in path Value may differ between uses
Unknown/unhandled op Cannot prove semantic equivalence
Non-equivalent subgraphs Different computation paths

Principle: Better to miss an optimization than to introduce incorrectness.


5. Pass Ordering and Integration

5.1 Pipeline Position

CombineInitializersPass     ← Merges initializer blocks
        ↓
UnifyEncodingForGlobalsPass ← THIS PASS
        ↓
... (cleanup pipeline)    ← Later passes can deduplicate identical globals
// Reference: compiler/src/iree/compiler/Dialect/Stream/Transforms/Passes.cpp
passManager.addPass(IREE::Util::createCombineInitializersPass());
if (clUnifyEncodingForGlobals) {
  passManager.addPass(IREE::Stream::createUnifyEncodingForGlobalsPass());
}
// ... followed by cleanup pipeline

Why this position:

  • After CombineInitializers: All initializer code is in one place, simplifies analysis
  • Before cleanup: Later passes can deduplicate globals that now have identical encodings

6. Complete IR Update Chain

When unifying encodings, ALL related ops must be updated consistently:

// BEFORE
%size = stream.tensor.sizeof tensor<4096x4096xf32, #old_encoding>
%enc = stream.tensor.encode %source -> tensor<4096x4096xf32, #old_encoding> in !stream.resource<*>{%size}

// AFTER
%size = stream.tensor.sizeof tensor<4096x4096xf32, #unified_encoding>
%enc = stream.tensor.encode %source -> tensor<4096x4096xf32, #unified_encoding> in !stream.resource<*>{%size}

Both stream.tensor.sizeof and stream.tensor.encode must use the same encoding, otherwise the size calculation would be wrong.


7. Summary: Why This Meets IREE's Bar

Criterion How Met
Correctness Conservative bail-out on any ambiguity; identity encoding always safe
Retargetability Zero hardware-specific logic; delegates to resolver infrastructure
Safety Requires immutability for both source and encoded globals
Testability Comprehensive positive AND negative test cases
Maintainability Clean analysis/transform separation; follows existing patterns
Extensibility Architecture ready for resolver integration and subgraph canonicalization

The pass prioritizes correctness over optimization - it will never introduce incorrect behavior, even if it means missing some optimization opportunities in complex cases.

hanhanW avatar Oct 30 '25 19:10 hanhanW

I'm brainstorming with claude and then I realized that DFX is not needed in this case. We only need simple traversal.

Collapsed into issue description above

hanhanW avatar Nov 25 '25 15:11 hanhanW

Many thanks to @benvanik 's Rewriting CombineInitializersPass work. It makes many things clearer to me.

hanhanW avatar Nov 25 '25 15:11 hanhanW

What does Mutable global in path mean here?

Does that only refer to the encoded globals and the source global?

I was wondering that in the context of subgraph equivalence: The RFC treats functions as black boxes, so two function calls of the same function with the same operands and attributes will be treated as equivalent subgraphs IIUC.

However, in the presence of any mutable global, which doesn't even have to be part of the chains between source and encoded globals, the same function with same operands and attributes could produce different outputs.

How would this be handled?

sommerlukas avatar Nov 26 '25 12:11 sommerlukas

What does Mutable global in path mean here?

It refers to the source global. IMO, we should mark functions as immutable or mutable in the analysis. We bail out if we reach any mutable globals or functions.

hanhanW avatar Dec 08 '25 15:12 hanhanW