openvino icon indicating copy to clipboard operation
openvino copied to clipboard

Fix LSTM conversion for models with rank > 3 inputs from Unsqueeze operations

Open evkotov opened this issue 1 month ago • 3 comments

Details:

Subgraph from silero_vad.onnx image

Problem

ONNX models exported from PyTorch frequently contain Unsqueeze operations before LSTM nodes. These operations add extra dimensions to tensors, resulting in rank-4 or rank-5 inputs to LSTM nodes. However, the ONNX LSTM specification strictly requires rank-3 inputs with shape [seq_length, batch_size, input_size]. Why this happens: PyTorch models use various tensor shapes during training During ONNX export, shape mismatches are "fixed" by inserting Unsqueeze nodes These Unsqueeze operations add dimensions with size 1 to match expected shapes The resulting LSTM inputs have rank > 3, violating ONNX LSTM specification

Real-world impact:

Models like silero_vad.onnx contain 4 LSTM nodes, all with Unsqueeze operations before them Without this fix LSTM models fail to convert to OpenVINO IR

Solution This fix adds automatic rank reduction in the ONNX Frontend LSTM converter (src/frontends/onnx/frontend/src/op/lstm.cpp). The implementation uses a two-strategy approach:

  1. Squeeze Strategy (optimal path): Used when all extra leading dimensions equal 1 Example: [1, 1, seq, batch, input] → [seq, batch, input] Zero-cost operation that only changes metadata, no data movement Applies to most real-world models (including silero_vad.onnx)
  2. Reshape Strategy (fallback path): Used when extra dimensions are > 1 or have dynamic shapes Example: [2, 3, seq, batch, input] → [6, batch, input] (flattens leading dimensions) Handles edge cases and dynamic shapes Uses dynamic shape calculation at runtime

Implementation details:

New function reduce_tensor_rank() analyzes input tensor rank and shape Automatically selects optimal strategy based on dimension values Applied to all LSTM inputs: X (data), initial_h (hidden state), initial_c (cell state) Transparent to users - no model modifications required Code structure:

// Analyze input shape
if (input_rank <= target_rank) {
    return input;  // No reduction needed
}

// Check if all extra dimensions equal 1
if (all_extra_dims_are_one) {
    // Use Squeeze - optimal path
    return Squeeze(input, axes);
} else {
    // Use Reshape - fallback path
    return Reshape(input, new_shape);
}

Performance: Squeeze path has zero runtime overhead (metadata-only operation) Reshape path adds minimal overhead only for edge cases No impact on models that already have rank-3 inputs

Tickets:

  • 162986

evkotov avatar Nov 25 '25 12:11 evkotov

LSTM spec (https://onnx.ai/onnx/operators/onnx__LSTM.html) describes X, init C and H as 3D tensors. Is this a broader behavior supported by onnxruntime?

mvafin avatar Nov 27 '25 15:11 mvafin

LSTM spec (https://onnx.ai/onnx/operators/onnx__LSTM.html) describes X, init C and H as 3D tensors. Is this a broader behavior supported by onnxruntime?

No, ONNX Runtime does not support broader rank behavior for LSTM inputs. It strictly validates rank=3: ONNX Runtime validation: https://github.com/microsoft/onnxruntime/blob/423a03f1fc80d3cbed4f973574ee96f31521a3d3/onnxruntime/core/providers/cpu/rnn/lstm_base.cc#L191-L192

if (X_shape.NumDimensions() != 3)
    return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
        "Input X must have 3 dimensions only. Actual:", X_shape);

Similarly for initial_h and initial_c: https://github.com/microsoft/onnxruntime/blob/423a03f1fc80d3cbed4f973574ee96f31521a3d3/onnxruntime/core/providers/cpu/rnn/lstm_base.cc#L221-L230

if (initial_h_shape.NumDimensions() != 3 || ...)
    return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Input initial_h must have shape {...}");

ONNX shape inference also validates rank=3: https://github.com/onnx/onnx/blob/main/onnx/defs/rnn/defs.cc#L26-L28

if (first_input_shape.dim_size() != 3) {
    fail_shape_inference("First input tensor must have rank 3");
}

So models like silero_vad.onnx work in ONNX Runtime because Unsqueeze nodes are typically removed during graph optimization when their input is a constant, or the model is exported with correct shapes. Our fix in OpenVINO handles the case where Unsqueeze cannot be optimized away (e.g., when input is dynamic).

evkotov avatar Dec 03 '25 11:12 evkotov

after discussion with @bumbosiepsak removed Unsqeeze logic

evkotov avatar Dec 11 '25 14:12 evkotov