PaddleFL
PaddleFL copied to clipboard
Bump paddlepaddle from 1.8.5 to 2.5.0
Bumps paddlepaddle from 1.8.5 to 2.5.0.
Release notes
Sourced from paddlepaddle's releases.
PaddlePaddle 2.5.0 Release Note
1. 重要更新
- 动静统一新架构:实现基础算子组合的动转静加编译器执行新模式,在ResNet50&Bert模型上完成动转静、组合算子、神经网络编译器优化加速全流程。动转静完成整图fallback核心功能开发,支持动转静失败时回退到动态图训练执行;组合算子设计一套包含150多个基础算子的基础算子体系,实现python层前向算子拆分机制和支持动、静态图的反向算子拆分机制,实现70多个常用前、反向算子的拆分;CINN编译器修复正确性问题,开发关键Pass,添加手工Schedule规则,实现内核代码自动生成,ResNet50模型性能提升12%,Bert模型性能提升10%。
- PHI算子库算子架构统一:将原算子体系下剩余的350+算子内核全部统一到PHI算子库中,以及原算子体系中的算子定义方式也都统一为PHI算子库的算子定义形式(基于YAML配置定义算子),提升了架构统一性,降低了框架开发的理解成本;将PHI算子库依赖的Fluid头文件全部解耦,并独立编译为动态链接库,为框架的二次开发提供更轻量的算子库复用方式;继续对飞桨框架中不规范的算子以及算子内核进行规范化调整,便于开发者理解,降低了硬件的接入成本。
- 静态图新执行器全面上线:静态图新执行器实现多项功能和性能优化,完成对原有多套旧执行器的统一和替换,成为静态图单卡和分布式训练python端入口以及动转静、控制流、CINN等后端默认使用的执行引擎,大幅提升框架调度性能,功能架构更加清晰,二次开发能力显著增强。
- Python API 支持0维tensor:为形状为
[1,]
及形状为[]
的张量定义了清晰的语义。- 新的环境适配:适配了CUDA 12,并支持使用gcc12进行编译。
2. 不兼容升级
- 飞桨API支持0维tensor。飞桨之前用shape为[1]的1维tensor来替代0维tensor,这种替代方式和当前主流习惯有差异,增加模型的开发调试成本,有时还会导致非预期错误。本版本对需支持0维tensor的376个API进行了修正,和社区广泛使用的工具如EinOps等实现。例如,在之前的情况下,模型训练中输出的loss为1维tensor,如果要取出或打印loss,往往需要使用
loss.numpy()[0]
这样的代码。经过本次修改后,模型训练中输出的loss为0维tensor,使用loss.numpy()
即可取出或打印loss,代码简短、易懂且符合业界使用习惯。paddle.fluid
API全面退场。按照上个版本已预告的计划,本次退场了1116个paddle.fluid
API及相关内部接口,剩余少量相关内部接口会在下个版本全部清理完成。fluid API属于飞桨2.0本计划移除但考虑到兼容性等因素延缓清理的历史API,本次退场清理不会影响基于飞桨2.0开发的程序,飞桨API体系也会更加简洁易懂。- 旧版动态图Python端代码完成清理。至此,Python端仅使用新版动态图调用C++核心逻辑。
- 为统一静态图模型数据并行的训练方式,废弃原有的单进程多卡训练方式,包括
paddle.static.ParallelExecutor
和paddle.static.CompiledProgram().with_data_parallel()
两个接口,原因是这套接口只支持单机多卡,不支持多机多卡,且底层执行性能较差。推荐统一使用多进程多卡训练方式,即paddle.distributed.launch
接口来进行数据并行的分布式训练。该升级只影响静态图,不影响动态图和动转静训练,如果使用了废弃接口,请参考 数据并行 的文档修改模型代码。#50351,#50501,#51240,#51701,#51616,#51369,#52671- 移除框架中原有的昇腾NPU和寒武纪MLU的适配代码,全部升级为CustomDevice插件式适配方式,并将昇腾NPU和寒武纪MLU的适配代码迁移至PaddleCustomDevice仓库。
3. 训练框架(含分布式)
Python API
API 支持0维tensor
- API输入支持0维tensor,涉及
paddle.reshape
、paddle.trace
、paddle.linalg.norm
等286个API。#53208, #53592, #47074, #53186, #47677, #49357, #50237, #46555, #47219, #47501, #47858, #47961, #48058, #48007, #49755, #51024, #51566, #51899, #49813, #47812, #47849, #47251, #53125, #53828, #51265, #47689, #48452, #49072, #48638, #49175, #49279, #50857, #49805, #47734, #45992, #49616, #49959, #50536, #49544, #49842, #46909, #49361, #50169, #48314, #48735, #49122, #49122, #49177, #49501, #49562, #49340, #49550, #49596, #49730, #49667, #49692, #49854, #49845, #49803, #49889, #49904, #49518, #49884, #49880, #49862, #49921, #49260, #49929, #49570, #49882, #50213, #49780, #50271, #50289, #50293, #49735, #50433, #49847, #50635, #50950, #50947, #49460, #53087, #51687, #52185, #54649- API输出支持0维tensor,涉及
paddle.sum
、paddle.min/max
、paddle.any/all
等90个API。#52891, #52861, #52775, #52850, #52843, #52857, #51721, #53051, #53192, #52739, #52741, #53175, #51889, #53199, #53242, #53421- 支持0维tensor后,修正原有不规范的代码,及对模型代码中的非规范用法进行提示和兼容。#51562, #51586, #51757, #52197, #54117。
new API
- 新增 jacobian 和 hessian API,用于科学计算。#53331
- 新增稀疏计算API。例如
paddle.sparse.reshape
、paddle.sparse.sum
和paddle.sparse.slice
等。#46694, #51513, #53794, #51406- 新增其它API。例如
paddle.optimizer.LBFGS
、paddle.index_put
和paddle.logaddexp
等。#53314, #51912, #52886, #50843, #47282, #52284动态图
新功能
- 新增了paddle.nn.utils.clip_grad_norm_用于支持梯度裁剪和paddle.Tensor.data_ptr用于获取Tensor数据的内存/显存的地址 PR49935, PR48235, PR49173
- 新增了saved_tensors_hooks机制,用于临时存放和取回用于反向计算使用的前向Tensor。 PR45763, PR46215, PR48124
- Tensor支持了pickler,用于支持Tensor的序列化。 PR47025, PR48179
- 新增了调试日志,反向出现nan/inf时打印前向Python堆栈 PR53217 PR52639 PR52729
- 新增了对expand_v2, tile, concat, assign, slice高阶微分的支持。PR45941, PR45942, PR45940, PR45879, PR45960
功能优化
- 优化了动态图的日志打印,包括日志内容优化、VLog级别优化、报错内容优化等。PR45783, PR46349, PR46934, PR47724
- 新增了FLAGS_auto_growth_chunk_size_in_mb用于auto_growth_allocator最小chunk size的设置 PR52204
bug fix
- 修复了一些算子的bug,包括:batch_norm, slice, set_value, scale, multinomial, adam, conv, transpose2_grad, conv2d_transpose_double_grad。PR47802, PR47634, PR47349, PR46124, PR46147, PR50388, PR48626, PR48519, PR50386, PR48432, PR51851
- 修复了PyLayer的一些错误问题。PR51740, PR47154, PR47323, PR54041, PR48533
- 确保sync_batch_norm在反向有序,防止错序导致hang或精度错误。PR52268, PR52860, PR52779
- 修复了linspace在AMP下的bug。PR46088
- 修复了Python C API错误调用导致Windows崩溃的问题。PR46833
- 修复了DataLoader可能遗漏删除/dev/shm的问题。PR48511
- 修复了paddle.grad的一些问题。PR47151
- 为不支持高阶微分的算子添加报错信息。PR47231
- 为python运算符添加numpyarray的支持。PR48229
... (truncated)
Commits
feff99f
update flash attn select (#54630) (#54716)570daa1
[cherrypick][inference]layer norm fix and ci fix (#54680)76067a3
fix compile (#54568) (#54677)e4401c4
cherry-pick #54567 (#54694)8077d79
[Cherry-Pick] Modify the bf16 accuracy checking framework in OpTest (#54658)0abd9ff
fix mea get pad no default return bug (#54647)e93e48e
[AMP] fix bf16 amp training error (#54571) (#54643)6b778b9
fix pp release_output (#54672)8b818d0
[Cherry-Pick] fix sync batch norm op under cuda12 (#54641)57d9b80
[Cherry-Pick]Add PHI option in cmake (#54462) (#54576)- Additional commits viewable in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
-
@dependabot rebase
will rebase this PR -
@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it -
@dependabot merge
will merge this PR after your CI passes on it -
@dependabot squash and merge
will squash and merge this PR after your CI passes on it -
@dependabot cancel merge
will cancel a previously requested merge and block automerging -
@dependabot reopen
will reopen this PR if it is closed -
@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually -
@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.