PaddleNLP add fast

PR types

Performance optimization

PR changes

Others

Description

基于fast_ln，支持了fast_rms_norm。使得rms_norm算子速度提升了1倍，模型吞吐如下：

模型	并行策略	pr前吞吐	pr后吞吐
Llama-2 7B	gbs8, sharding8-mbs1-acc1	4454.693	4490.384
Llama-2 13B	gbs8, pp4sharding2-vpp5-mbs1-acc4	2229.921	2252.541

开关use_fast_layer_norm能够诸位对齐

Jun 28 '24 02:06 deepllz

Thanks for your contribution!

Jun 28 '24 02:06 paddle-bot[bot]

Codecov Report

Attention: Patch coverage is 22.22222% with 7 lines in your changes missing coverage. Please review.

Project coverage is 55.74%. Comparing base (c574d6d) to head (0a7af50). Report is 222 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/llama/fusion_ops.py	25.00%	6 Missing :warning:
paddlenlp/transformers/llama/modeling.py	0.00%	1 Missing :warning:

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8680   +/-   ##
========================================
  Coverage    55.74%   55.74%           
========================================
  Files          623      623           
  Lines        97454    97457    +3     
========================================
+ Hits         54323    54331    +8     
+ Misses       43131    43126    -5

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jun 28 '24 03:06 codecov[bot]

测试精度的结果，PR里面展示一下吧。

Jul 01 '24 03:07 ZHUI

PaddleNLP
PaddleNLP copied to clipboard

add fast_rmsnorm

PR types

PR changes

Description

Codecov Report

PaddleNLP PaddleNLP copied to clipboard

add fast_rmsnorm

PR types

PR changes

Description

Codecov Report

PaddleNLP
PaddleNLP copied to clipboard