candle
candle copied to clipboard
Add high-performance GLU activation variants (GLU, GeGLU, ReGLU) with comprehensive benchmarkingAr develop
High-Performance Core Implementation
- GLU: Classic sigmoid-gated activation
σ(x_left) ⊙ x_right - GeGLU: GELU-gated variant (transformer standard)
- ReGLU: ReLU-gated variant with 10-20x speedup over GeGLU
Performance Excellence
| Activation | 8192 elements (F32) | Use Case |
|---|---|---|
| ReGLU | ~4.9 µs | High-speed inference |
| GLU | ~31 µs | Balanced performance |
| GeGLU | ~62 µs | Training quality |
Architecture Integration
Dual API Design
// Direct tensor methods (maximum performance)
let output = input.reglu()?;
// Activation enum (configuration-driven)
let config = Config { hidden_act: Activation::GeGlu, .. };
Transformer Integration
Phi-3 native support with configurable GLU variants
Performance-quality tradeoffs for different deployment scenarios
Zero-config defaults (GeGLU standard, ReGLU for speed)
// Mobile/Edge: 10-20x faster inference
Config::with_activation(Activation::ReGlu)
// Research/Training: Maximum expressiveness
Config::with_activation(Activation::GeGlu)