candle Add high-performance GLU activation variants (GLU, GeGLU, ReGLU) with comprehensive benchmarkingAr develop

Add high-performance GLU activation variants (GLU, GeGLU, ReGLU) with comprehensive benchmarkingAr develop

Open artem1984A opened this issue 5 months ago • 0 comments

20250629_1311_GLU Function Graph_simple_compose_01jyxp3a2zevx8b4q9e3wjcvpm 20250629_1314_GeGLU and ReGLU Functions_simple_compose_01jyxp8ptne0abjprdmrzhd5b8 20250629_1329_Activation Functions Visualized_simple_compose_01jyxq2b5peq9vfb148nhf60nb

High-Performance Core Implementation

GLU: Classic sigmoid-gated activation σ(x_left) ⊙ x_right
GeGLU: GELU-gated variant (transformer standard)
ReGLU: ReLU-gated variant with 10-20x speedup over GeGLU

Performance Excellence

Activation	8192 elements (F32)	Use Case
ReGLU	~4.9 µs	High-speed inference
GLU	~31 µs	Balanced performance
GeGLU	~62 µs	Training quality

Architecture Integration

Dual API Design

// Direct tensor methods (maximum performance)
let output = input.reglu()?;

// Activation enum (configuration-driven)
let config = Config { hidden_act: Activation::GeGlu, .. };

Transformer Integration
Phi-3 native support with configurable GLU variants
Performance-quality tradeoffs for different deployment scenarios
Zero-config defaults (GeGLU standard, ReGLU for speed)

// Mobile/Edge: 10-20x faster inference
Config::with_activation(Activation::ReGlu)

// Research/Training: Maximum expressiveness  
Config::with_activation(Activation::GeGlu)

Jun 20 '25 11:06 artem1984A

candle candle copied to clipboard

Add high-performance GLU activation variants (GLU, GeGLU, ReGLU) with comprehensive benchmarkingAr develop

High-Performance Core Implementation

Performance Excellence

Architecture Integration

Dual API Design

candle
candle copied to clipboard