Paddle issues

fix fused_rms_norm bug

2

### PR Category Operator Mechanism ### PR Types Bug fixes ### Description 修复 fused_rms_norm 大tensor cuda700问题，但是仍然（15/512 miss）精度问题，需要进一步排查 pcard-67164

liuruyan

[Auto Parallel] Add spmd rule No.4、13 for (batch_norm,sync_batch_norm) and their backward ops.

1

### PR Category Auto Parallel ### PR Types New features ### Description - 【开源任务】算子切分推导规则开发，支持更多模型使用自动并行，简化更多用户的分布式开发成本。 - [No.4 batch_norm](https://github.com/PaddlePaddle/Paddle/issues/72415#issuecomment-2899735426) [No.13 sync_batch_norm](https://github.com/PaddlePaddle/Paddle/issues/72415#issuecomment-2911628674) - 将除了做batch_norm以外的维度全部强制为Replicated

Glencsa

contributor

HappyOpenSource Pro

paddle稀疏反卷积算子支持需求

### 需求描述 Feature Description 任务目标（请描述你正在做的项目是什么，如模型、论文、项目是什么？）; 需求场景（请描述你的项目中为什么需要用此功能）; 功能描述（请简单描述或设计这个功能）目前我们在paddle上复现CVPR2024 点云检测相关任务（HEDNet，SAFDNet，github链接：https://github.com/zhanggang001/HEDNet/tree/main/pcdet），论文中有一个关键的Encoder-Decoder结构需要用到稀疏反卷积操作，目前paddle还不支持，辛苦增加这部分算子。 ### 替代实现 Alternatives _No response_

YanxianChen

status/new-issue

type/feature-request

Bugfix/unique consecutive

3

### PR Category Operator Mechanism ### PR Types Bug fixes ### Description 修复了 `paddle.unique_consecutive` 大 Tensor 下的报错问题。 - **算子简介** `paddle.unique_consecutive` 主要功能是去除张量中的连续重复元素，同时可以根据传入的 flag 值返回不同的元组。 - **修改点** 问题定位在 `unique_consecutive_functor.h` 中的以下代码段。怀疑是 `auto` 自动判定类型导致溢出，修改为...

LCStayingdullCircuit

contributor

add feature _only_reshard_mesh_shape and get_local_slice

1

### PR Category Auto Parallel ### PR Types New features ### Description 添加了 feature _only_reshard_mesh_shape 和 get_local_slice的功能，实现在不进行实际切分张量的情况下，用迭代模拟的方式得到了理论上每张GPU切分后的local_slice，支持shard,replicate和partail三种齐全的placements，并且支持不均匀切分和多重切分的情况。

smile2game

contributor

`paddle.divide` Incorrectly Computes Complex Divided by Infinity.

1

### bug描述 Describe the Bug When dividing a complex number by positive infinity using paddle.divide, the result is (nan+nanj). This is mathematically incorrect. The expected result is (0+0j). The equivalent...

rookieLiu2018

status/new-issue

type/bug-report

[Accuracy diff No.79] Fix accuracy diff for paddle.combinations API

1

### PR Category Execute Infrastructure ### PR Types Improvements ### Description 日志显示反向梯度为 None，没有梯度回传删除部分特判 paddle.empty(shape=[0, r], dtype=x.dtype) 代码，实际上去掉代码的运行结果还是正确的（torch 代码上也没有特判https://github.com/pytorch/pytorch/blob/d632cf2cc9aac8ab0e03d1537982265e42be95e5/aten/src/ATen/native/Itertools.cpp#L60-L73 - PadddleAPITest 测试通过 ![图片](https://github.com/user-attachments/assets/8cb87178-d81f-4a10-99de-6b0ff2e46de9) #### TODO - [ ] GPU 环境下单测失败，怀疑是...

ooooo-create

contributor

HappyOpenSource Pro

optimize the compilation options to reduce binary size

1

### PR Category User Experience ### PR Types Improvements ### Description optimize the compilation options to reduce binary size

zhangting2020

[API] `paddle.slogdet` 返回值规范化

8

### PR Category User Experience ### PR Types Bug fixes ### Description 原先paddle.slogdet返回的是一个Tensor， Shape 为 [2, *]。现跟torch、numpy对齐，返回值为tuple(Tensor, Tensor)即tuple(sign, logdet)。参考标准：

aquagull

contributor

Support wint2.5 which use uint16 to store.

1

### PR Category Inference ### PR Types Improvements ### Description Pcard-67012 `GemmDataType`支持`INT16`类型。

Xreki

Paddle
Paddle copied to clipboard

Metadata

fix fused_rms_norm bug

[Auto Parallel] Add spmd rule No.4、13 for (batch_norm,sync_batch_norm) and their backward ops.

paddle稀疏反卷积算子支持需求

Bugfix/unique consecutive

add feature _only_reshard_mesh_shape and get_local_slice

`paddle.divide` Incorrectly Computes Complex Divided by Infinity.

[Accuracy diff No.79] Fix accuracy diff for paddle.combinations API

optimize the compilation options to reduce binary size

[API] `paddle.slogdet` 返回值规范化

Support wint2.5 which use uint16 to store.

← Metadata

Owner

Metadata

Paddle Paddle copied to clipboard

Metadata

← Metadata

Owner

Metadata

Paddle
Paddle copied to clipboard