PaddleNLP
PaddleNLP copied to clipboard
add fast_rmsnorm
PR types
Performance optimization
PR changes
Others
Description
基于fast_ln,支持了fast_rms_norm。 使得rms_norm算子速度提升了1倍,模型吞吐如下:
| 模型 | 并行策略 | pr前吞吐 | pr后吞吐 |
|---|---|---|---|
| Llama-2 7B | gbs8, sharding8-mbs1-acc1 | 4454.693 | 4490.384 |
| Llama-2 13B | gbs8, pp4sharding2-vpp5-mbs1-acc4 | 2229.921 | 2252.541 |
开关use_fast_layer_norm能够诸位对齐
Thanks for your contribution!
Codecov Report
Attention: Patch coverage is 22.22222% with 7 lines in your changes missing coverage. Please review.
Project coverage is 55.74%. Comparing base (
c574d6d) to head (0a7af50). Report is 222 commits behind head on develop.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| paddlenlp/transformers/llama/fusion_ops.py | 25.00% | 6 Missing :warning: |
| paddlenlp/transformers/llama/modeling.py | 0.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #8680 +/- ##
========================================
Coverage 55.74% 55.74%
========================================
Files 623 623
Lines 97454 97457 +3
========================================
+ Hits 54323 54331 +8
+ Misses 43131 43126 -5
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
测试精度的结果,PR里面展示一下吧。