candle
candle copied to clipboard
Fix LayerNorm gradient flow issue
- Fix LayerNorm.forward() to use tensor operations instead of scalar operations
- Replace sum_keepdim()/size with mean_keepdim() to preserve gradients
- Use broadcast_add() with epsilon tensor instead of scalar addition
- Fix ops::layer_norm_slow() with same gradient-preserving changes
- Update ops::layer_norm() to use slow implementation for proper gradients
- Add comprehensive gradient flow test (now passes with 100% gradient flow)
- Add numerical equivalence test to ensure accuracy is maintained
- Fixes training issues where LayerNorm parameters weren't being updated
Resolves gradient propagation bug where only 33% of parameters received gradients during backpropagation, preventing proper model training. #3011