candle icon indicating copy to clipboard operation
candle copied to clipboard

Add high-performance GLU activation variants (GLU, GeGLU, ReGLU) with comprehensive benchmarkingAr develop

Open artem1984A opened this issue 5 months ago • 0 comments

20250629_1311_GLU Function Graph_simple_compose_01jyxp3a2zevx8b4q9e3wjcvpm 20250629_1314_GeGLU and ReGLU Functions_simple_compose_01jyxp8ptne0abjprdmrzhd5b8 20250629_1329_Activation Functions Visualized_simple_compose_01jyxq2b5peq9vfb148nhf60nb

High-Performance Core Implementation

  • GLU: Classic sigmoid-gated activation σ(x_left) ⊙ x_right
  • GeGLU: GELU-gated variant (transformer standard)
  • ReGLU: ReLU-gated variant with 10-20x speedup over GeGLU

Performance Excellence

Activation 8192 elements (F32) Use Case
ReGLU ~4.9 µs High-speed inference
GLU ~31 µs Balanced performance
GeGLU ~62 µs Training quality

Architecture Integration

Dual API Design

// Direct tensor methods (maximum performance)
let output = input.reglu()?;

// Activation enum (configuration-driven)
let config = Config { hidden_act: Activation::GeGlu, .. };

Transformer Integration
Phi-3 native support with configurable GLU variants
Performance-quality tradeoffs for different deployment scenarios
Zero-config defaults (GeGLU standard, ReGLU for speed)

// Mobile/Edge: 10-20x faster inference
Config::with_activation(Activation::ReGlu)

// Research/Training: Maximum expressiveness  
Config::with_activation(Activation::GeGlu)

artem1984A avatar Jun 20 '25 11:06 artem1984A