AiDotNet icon indicating copy to clipboard operation
AiDotNet copied to clipboard

[Architecture] Design ILayeredModel to SURPASS ONNX for model partitioning

Open ooples opened this issue 2 months ago • 0 comments

Goal: EXCEED Industry Standards

Create a model partitioning architecture that SURPASSES ONNX (current industry standard) by fixing its limitations and providing superior capabilities for AiDotNet.

Related: Issue #414, PR #424


ONNX Limitations & Flaws

1. Serialization Overhead

  • ❌ Must serialize entire model to protobuf
  • ❌ Roundtrip: In-memory → Binary → Parse → In-memory
  • ❌ Performance: ~50-200ms overhead per partition operation
  • ❌ Memory: 2-3x model size during conversion

2. Loss of Type Safety

  • ❌ Everything becomes object or protobuf types
  • ❌ Generic type information lost (T becomes runtime checks)
  • ❌ No compile-time verification of tensor shapes
  • ❌ Runtime errors instead of compile errors

3. External Dependencies

  • ❌ Requires onnx, onnx-graphsurgeon, or similar
  • ❌ Python-centric tooling (not native C#)
  • ❌ Version compatibility issues
  • ❌ Large dependency footprint

4. Limited Metadata

  • ❌ No custom layer attributes beyond standard ONNX schema
  • ❌ Can't annotate layers with domain-specific info
  • ❌ No built-in profiling/telemetry hooks
  • ❌ Loses C#-specific optimizations

5. Graph Manipulation Complexity

  • ❌ Manual tensor shape inference after splitting
  • ❌ Must handle operator type compatibility
  • ❌ Complex initializer management
  • ❌ Easy to create invalid graphs

AiDotNet SUPERIOR Approach

Option 1: ILayeredModel with Compile-Time SafetyRECOMMENDED

/// <summary>
/// Native C# model interface with ZERO serialization overhead
/// and full compile-time type safety.
/// </summary>
public interface ILayeredModel<T, TInput, TOutput> : IFullModel<T, TInput, TOutput>
    where T : struct
{
    /// <summary>
    /// Gets strongly-typed layer metadata with O(1) access.
    /// ADVANTAGE: No parsing, no serialization, direct memory access.
    /// </summary>
    IReadOnlyList<LayerMetadata<T>> GetLayers();

    /// <summary>
    /// Extracts layers with compile-time shape verification.
    /// ADVANTAGE: Type-safe extraction, catches errors at compile time.
    /// </summary>
    IFullModel<T, TIntermediateOutput, TOutput> ExtractLayers<TIntermediateOutput>(
        int startIndex,
        int endIndex);

    /// <summary>
    /// Gets intermediate shape with compile-time guarantees.
    /// </summary>
    TensorShape<T> GetIntermediateShape(int layerIndex);

    /// <summary>
    /// Validates partition point compatibility.
    /// ADVANTAGE: Pre-validation before extraction, fail fast.
    /// </summary>
    PartitionValidationResult ValidatePartition(int splitIndex);
}

public class LayerMetadata<T> where T : struct
{
    public string Name { get; init; } = string.Empty;
    public LayerType Type { get; init; }

    // ADVANTAGE: Direct parameter access without deserialization
    public ReadOnlyMemory<T> Parameters { get; init; }
    public int ParameterStartIndex { get; init; }
    public int ParameterCount { get; init; }

    // ADVANTAGE: Compile-time shape types
    public TensorShape<T> InputShape { get; init; } = new();
    public TensorShape<T> OutputShape { get; init; } = new();

    // ADVANTAGE: Custom metadata for AiDotNet-specific optimizations
    public LayerOptimizationHints OptimizationHints { get; init; } = new();
    public LayerProfilingInfo? ProfilingData { get; init; }

    // ADVANTAGE: Native C# lambda support for custom operations
    public Func<Vector<T>, Vector<T>>? CustomForward { get; init; }
}

public struct TensorShape<T> where T : struct
{
    public ReadOnlySpan<int> Dimensions { get; init; }
    public Type ElementType => typeof(T);  // Compile-time type
    public int TotalElements => Dimensions.ToArray().Aggregate(1, (a, b) => a * b);

    // ADVANTAGE: Compile-time shape validation
    public bool IsCompatibleWith<TOther>(TensorShape<TOther> other)
        where TOther : struct;
}

ADVANTAGES OVER ONNX:

  1. ZERO serialization overhead - Direct in-memory operations
  2. Compile-time type safety - Catch errors before runtime
  3. 10-100x faster partitioning (no protobuf parsing)
  4. 1x memory usage (no intermediate buffers)
  5. Native C# idioms - LINQ, generics, spans
  6. Custom metadata - Unlimited extensibility
  7. Profiling hooks - Built-in performance measurement
  8. AOT compatible - Works with native compilation

Option 2: Hybrid with Smart Caching

Combine ILayeredModel (fast path) with ONNX (compatibility):

public class SmartModelPartitioner<T, TInput, TOutput> where T : struct
{
    private readonly ConcurrentDictionary<string, CachedPartition> _cache = new();

    public PartitionedModel<T, TInput, TOutput> Partition(
        IFullModel<T, TInput, TOutput> model,
        int splitIndex)
    {
        // Fast path: Native ILayeredModel (0ms overhead)
        if (model is ILayeredModel<T, TInput, TOutput> layered)
        {
            return PartitionNative(layered, splitIndex);  // <1ms
        }

        // Compatibility: ONNX with caching (50ms first time, 1ms cached)
        var cacheKey = GetModelHash(model) + splitIndex;
        if (_cache.TryGetValue(cacheKey, out var cached))
        {
            return cached.ToPartitionedModel();
        }

        var result = PartitionViaOnnx(model, splitIndex);
        _cache[cacheKey] = CachePartition(result);
        return result;
    }
}

ADVANTAGES:

  • ✅ Best of both worlds
  • ✅ Fast path for ILayeredModel implementations
  • ✅ Compatibility with any model via ONNX
  • ✅ Caching eliminates repeated overhead

Option 3: Graph-Based with JIT Compilation 🚀 FUTURE

public interface IGraphModel<T, TInput, TOutput> : ILayeredModel<T, TInput, TOutput>
    where T : struct
{
    /// <summary>
    /// Gets the full computational graph as native C# objects.
    /// ADVANTAGE: Can apply graph-level optimizations at runtime.
    /// </summary>
    ComputationalGraph<T> GetGraph();

    /// <summary>
    /// Creates model from graph with JIT compilation.
    /// ADVANTAGE: Generate optimized native code for specific hardware.
    /// </summary>
    IGraphModel<T, TNewInput, TNewOutput> FromGraph<TNewInput, TNewOutput>(
        ComputationalGraph<T> graph);

    /// <summary>
    /// Applies graph transformations (fusion, quantization, etc.)
    /// ADVANTAGE: Optimize entire computation pipeline, not just layers.
    /// </summary>
    IGraphModel<T, TInput, TOutput> Transform(
        params IGraphTransformation<T>[] transformations);
}

public class ComputationalGraph<T> where T : struct
{
    public IReadOnlyList<GraphNode<T>> Nodes { get; init; }
    public IReadOnlyList<GraphEdge<T>> Edges { get; init; }

    // ADVANTAGE: Native C# graph manipulation (faster than protobuf)
    public ComputationalGraph<T> Slice(int startNode, int endNode);
    public ComputationalGraph<T> Fuse(IEnumerable<int> nodeIndices);
    public ComputationalGraph<T> Optimize(OptimizationLevel level);

    // ADVANTAGE: JIT compilation to native code
    public Func<TInput, TOutput> CompileToDelegate<TInput, TOutput>();
}

ADVANTAGES:

  • Arbitrary DAG support (not just sequential)
  • Graph-level fusion (Conv+BatchNorm+ReLU in single op)
  • JIT compilation to native code (Burst, CoreCLR, or LLVM)
  • Runtime specialization for specific hardware
  • Automatic parallelization across layers

Performance Comparison

Feature ONNX ILayeredModel IGraphModel
Partition Speed 50-200ms <1ms <5ms
Memory Overhead 2-3x 0x 0.1x
Type Safety Runtime Compile Compile
Custom Metadata Limited Unlimited Unlimited
JIT Compilation
Graph Fusion External Manual Automatic
AOT Compatible ⚠️

Implementation Phases

Phase 1: ILayeredModel Foundation (2-3 weeks)

Files to create:

  • src/Interfaces/ILayeredModel.cs
  • src/Interfaces/LayerMetadata.cs
  • src/LinearAlgebra/TensorShape.cs

Files to modify:

  • src/Deployment/Edge/EdgeOptimizer.cs - Add ILayeredModel support
  • src/Models/SequentialModel.cs - Implement ILayeredModel (reference)

Benefits:

  • ✅ Immediate performance improvement for sequential models
  • ✅ Foundation for advanced features
  • ✅ Backward compatible

Phase 2: Smart Partitioner with Caching (1-2 weeks)

Files to create:

  • src/Deployment/Edge/SmartModelPartitioner.cs
  • src/Deployment/Edge/CachedPartition.cs

Benefits:

  • ✅ Best performance for ILayeredModel
  • ✅ Compatibility with non-layered models
  • ✅ Caching eliminates ONNX overhead on repeated calls

Phase 3: IGraphModel with JIT (4-6 weeks)

Files to create:

  • src/Interfaces/IGraphModel.cs
  • src/ComputationGraph/ (entire namespace)
  • src/JIT/ (JIT compilation infrastructure)

Benefits:

  • ✅ Graph-level optimizations (fusion, quantization)
  • ✅ JIT compilation to native code
  • ✅ Arbitrary DAG support
  • ✅ Industry-leading performance

Discussion Questions

  1. Priority: Start with Phase 1 immediately, or wait for graph design?
  2. Namespace: AiDotNet.Interfaces or AiDotNet.Deployment.Interfaces?
  3. Metadata Depth: Minimal (Phase 1) or rich (include profiling, hints)?
  4. JIT Backend: Use System.Reflection.Emit, Burst, or external (LLVM)?
  5. Breaking Changes: Can we add methods to IFullModel in major version?

Success Criteria

  • 10-100x faster than ONNX partitioning
  • Zero serialization overhead for native models
  • Compile-time type safety prevents runtime errors
  • Unlimited extensibility via custom metadata
  • Backward compatible with existing code
  • Surpasses TensorFlow, PyTorch, ONNX in C# ecosystem

ooples avatar Nov 09 '25 19:11 ooples