composer
composer copied to clipboard
Add Low Precision LayerNorm
trafficstars
[WIP] Will replace Fused LayerNorm, since Fused LayerNorm's speedup comes from running in low precision mode. Equivalent converge performance has been verified on standard NLP models (Bert, GPT).
Next commits:
- Resolve type issues
- Add test: tests/algorithms/test_low_precision_layernorm.py
- Remove FusedLayerNorm code.