PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

[Auto Parallel] fix bugs for split_batches_for_accumulation && fix bu…

Open zhangyuqin1998 opened this issue 1 year ago • 2 comments

…gs for enable_delay_scale_loss

PR types

Bug fixes

PR changes

Others

Description

A.修复动态图自动并行下,split_batches_for_accumulation与动手无法对齐的情况。如图 dae478d91bc7c7a10aa3bcc927793128

B.修复动态图自动并行下,enable_delay_scale_loss逻辑错误的问题。自动并行默认实现enable_delay_scale_loss,预期行为为:

  1. 每个micro batch计算出loss
  2. 反向传播
  3. 对mini batch内的loss进行求和
  4. 对loss进行scale,除以acc数

但当前动态图自动并行的行为为:

  1. 每个micro batch计算出loss
  2. 对该loss进行scale,除以acc数
  3. 反向传播
  4. 对mini batch内的loss进行求和

此外,由于enable_delay_scale_loss逻辑不完善,为自动并行增加开关

C. 修复静态图自动并行下,loss打印展示问题。loss预期的展示行为为:

  1. 对每个micro batch的loss求和
  2. 对求和结果进行scale,除以acc数

但当前静态图自动并行loss对打印展示行为为:

  1. 对每个micro batch的loss进行scale,除以acc数
  2. 进行求和

D. 修复函数名称错误,将traning修正为training

zhangyuqin1998 avatar Sep 29 '24 04:09 zhangyuqin1998

Thanks for your contribution!

paddle-bot[bot] avatar Sep 29 '24 04:09 paddle-bot[bot]

Codecov Report

Attention: Patch coverage is 2.77778% with 35 lines in your changes missing coverage. Please review.

Project coverage is 52.74%. Comparing base (76a118b) to head (ad0488c). Report is 264 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/trainer/auto_trainer.py 0.00% 34 Missing :warning:
paddlenlp/trainer/trainer.py 50.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9217      +/-   ##
===========================================
- Coverage    53.11%   52.74%   -0.38%     
===========================================
  Files          665      661       -4     
  Lines       109041   107375    -1666     
===========================================
- Hits         57918    56634    -1284     
+ Misses       51123    50741     -382     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Sep 29 '24 04:09 codecov[bot]