Paddle issues

[XPU] bump XHPC to 20250614

1

### PR Category Custom Device ### PR Types Not User Facing ### Description [XPU] bump XHPC to 20250614

dynamicheart

XPU

[0-size Tensor Job2 No.38] Add 0-size Tensor support for index_sample

5

### PR Category Execute Infrastructure ### PR Types Improvements ### Description index_sample 修改前向和反向，cpu/gpu/xpu kernel infermeta 去掉0-size判断，symbolic shape 中没有对应代码 https://github.com/PaddlePaddle/Paddle/blob/d637d5b53a57dc440cb9ec3ec1f5e2f23b5bc7d8/paddle/fluid/pir/dialect/operator/interface/infer_symbolic_shape/binary_infer_sym.cc#L1141 PaddleAPITest 测试通过，CPU/GPU错误都为numpy error ![image](https://github.com/user-attachments/assets/a5cdd9e2-7ad3-4066-b415-692771ad5ad5)

co63oc

contributor

HappyOpenSource Pro

修复atleast函数中，”输入为tensor的list，输出不是tensor的list“的bug

6

### PR Category User Experience ### PR Types Bug fixes ### Description 现在当输入为tensor的list时，输出为tensor的list

Qin-sx

contributor

【开源任务】Paddle CPU/GPU Kernel 精度问题推全

21

## 一、背景 Paddle目前正在对全量API的边界正确性做系统性排查，我们开发了[PaddleAPITest](https://github.com/PFCCLab/PaddleAPITest)用于测试存在正确性问题的API。通过与Torch执行相同的API进行精度对比，我们发现一些API与Torch的API存在精度diff。经初步少量API确认，我们发现Paddle API确实存在一些正确性问题（过程中也发现了少量Torch API的正确性问题，如torch.tril、torch.triu）。现将这些问题Paddle API公开，邀请社区同学共同解决问题。参与本项活动，你将学习到Paddle算子库框架的设计，并对Paddle CPU、GPU Kernel的实现风格有详细的了解，对算子精度问题的调试技能积累一定经验。 ## 二、任务描述 ### 2.1 任务简介以及任务分配针对通过[PaddleAPITest](https://github.com/PFCCLab/PaddleAPITest)测试出的一些和torch存在精度diff的Paddle API，查找其出现的原因并进行修复。存在精度diff的API以及任务分配如下： > [!IMPORTANT] > 每个任务难度：0.15×🌟 > 题目讲解见录屏文件：https://meeting.tencent.com/crm/l59EWmRZc4 （00:52:00~00:59:30） | 序号 | API | kernel类别...

lshpku

Pr pipeline stage

4

### PR Category Auto Parallel ### PR Types Others ### Description 优化了一些PipelineStage框架中的代码，同时提交关于PipelineStage框架的单测，对比了朴素流水并行，以及单卡视角下的训练结果loss，来验证PipelineStage框架的可行性和准确性，v-schedules需要更多的开发代码，因此此处仅对其相关的函数进行简单测试。

zty-king

contributor

优化scaled_dot_product_attention中的后端切换逻辑

4

### PR Category User Experience ### PR Types Improvements ### Description flash attention应该是调用的Paddle fork的[flash attention](https://github.com/PaddlePaddle/flash-attention)库 scaled_dot_product_attention调用的接口应该是 ```cpp bool flash_attn_fwd(const void * const q, // batch_size x seqlen_q x num_heads x...

Qin-sx

contributor

[PIR save/load]Add DataTypeAttribute version-compat patch

### PR Category Execute Infrastructure ### PR Types Improvements ### Description pcard-67164 Add DataTypeAttribute version-compat patch. DataTypeAttribute在patch中修改或新增具体DataType值时，需要用DataTypeToString中的string字段来标识。

changeyoung98

[0-size Tensor Job2 No.78、98] Add 0-size Tensor support for sum

### PR Category Execute Infrastructure ### PR Types Improvements ### Description sum 已有修改，问题原因是类型为int32或bool时，需要转换类型为int64，但 FullKernel 中 dtype 参数实际没有使用，需要指定类型修改cpu/xpu/gpu onednn不支持指定int64类型没有修改 test/legacy_test/test_reduce_op.py 已有单测不再重复添加 PaddleAPITest测试CPU/GPU通过 ![image](https://github.com/user-attachments/assets/9d51d403-f147-4880-87ae-2e3d8661bc29)

co63oc

[XPU] support moe_combine on XPU

3

### PR Category Custom Device ### PR Types New features ### Description [XPU] support moe_combine moe_combine_grad

zhouquan32

XPU

[0-size Tensor Retest No.2、7、8、10]Fix Output accuracy error for max_pool1d for 0 size Input

3

### PR Category Execute Infrastructure ### PR Types Bug fixes ### Description - paddle.clip 、paddle.nn.functional.max_unpool2d、paddle.allclose 为PaddleAPITest的bug导致的报错。 - 主要修复paddle.nn.functional.max_pool1d 输出时精度和torch对不上的问题。 ```paddle.nn.functional.max_pool1d(x=Tensor([2, 3, 0],"float64"), kernel_size=2, stride=1, padding=list[1,1,])``` 这个测试case，paddle输出为全nan，torch输出为全0.0 - 由于Pool2dKernel被多个api调用，因此需要通过pooling_type区分是max还是avg，二者初始化的值不同。 - PaddleAPITest回测结果：...

DanielSun11

Paddle
Paddle copied to clipboard

Metadata

[XPU] bump XHPC to 20250614

[0-size Tensor Job2 No.38] Add 0-size Tensor support for index_sample

修复atleast函数中，”输入为tensor的list，输出不是tensor的list“的bug

【开源任务】Paddle CPU/GPU Kernel 精度问题推全

Pr pipeline stage

优化scaled_dot_product_attention中的后端切换逻辑

[PIR save/load]Add DataTypeAttribute version-compat patch

[0-size Tensor Job2 No.78、98] Add 0-size Tensor support for sum

[XPU] support moe_combine on XPU

[0-size Tensor Retest No.2、7、8、10]Fix Output accuracy error for max_pool1d for 0 size Input

← Metadata

Owner

Metadata

Paddle Paddle copied to clipboard

Metadata

← Metadata

Owner

Metadata

Paddle
Paddle copied to clipboard