Paddle
Paddle copied to clipboard
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
### PR Category Execute Infrastructure ### PR Types Improvements ### Description 这里是继续 https://github.com/PaddlePaddle/Paddle/pull/73094 中 paddle.inner的修改 PaddleAPITest 测试通过 GPU  CPU  代码只增加一个reshape,没有其他增加,性能测试使用的是全为1的shape,实际中这样的情况不多,使用这个分支执行的也不多 修改了PaddleTest的测试用例,看是否可以修改 https://github.com/PaddlePaddle/PaddleTest/pull/3096
### PR Category Execute Infrastructure ### PR Types Improvements ### Description 17 paddle.geometric.segment_max 18 paddle.geometric.segment_mean 19 paddle.geometric.segment_min 20 paddle.geometric.segment_sum 31 paddle.incubate.segment_max 32 paddle.incubate.segment_mean 33 paddle.incubate.segment_min 34 paddle.incubate.segment_sum paddle.incubate.* 算子和paddle.geometric.*相同 修改前向和反向,CPU/GPU...
### PR Category Execute Infrastructure ### PR Types Improvements ### Description 原问题是输入小于等于0时 gammaln 的梯度会直接返回 0,参照 torch 的实现对 gammaln_grad kernel 进行了修改 回测结果: 
### PR Category Execute Infrastructure ### PR Types Improvements ### Description 修改infermeta跳过0-size检查,同时修改符号推导 修改前向和反向, CPU/GPU/XPU,反向填充0,torch中没有对应接口,是使用silu PaddleAPITest 测试都通过 GPU  CPU 
### PR Category User Experience ### PR Types Others ### Description profiler 不使用 fluid 头文件
### PR Category Environment Adaptation ### PR Types Others ### Description Add Auto-Parallel pcard-67164
### PR Category Auto Parallel ### PR Types New features ### Description card-73263 自动并行动半流水并行基础组件 Schedules
### PR Category Auto Parallel ### PR Types New features ### Description add cp&sep strategy cp: ring attention sep: segment parallel strategy, similar as Deepspeed Ulyssess Pcard-91295
### PR Category Operator Mechanism ### PR Types Bug fixes ### Description Pcard-85711
Pr 72833
### PR Category ### PR Types ### Description