Paddle
Paddle copied to clipboard
[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape
PR types
New features
PR changes
OPs
Describe
add paddle.incubate.sparse.reshape
本PR的前身是 https://github.com/PaddlePaddle/Paddle/pull/46242 ,本PR是删除了与第22号任务无关的内容后的精简版
你的PR提交成功,感谢你对开源项目的贡献! 请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.
"Floating-point exception", so change the dtype in all test cases to int64 to check whether the computation logic is right

After changing the dtype in test cases to int64, the test cases still report "Floating-point exception", it is strange

Check if the computation of ReshapeCooGradKernel is right

Integer divide-by-zero

It seems that the example code in the docstring of python API will also be used in CI test


It seems that CI test will check your comment content

// /* Caution : 这是原来的计算逻辑,我认为是 错误的, // 这里计算逻辑是: 原tensor的shape是 (10, 20, 30, 40, 50) // 一个非零元素的索引为 (1, 2, 3, 4, 5) // 进行transpose 后, tensor的shape 是 (30, 10, 50, 20, 40) // 这里的计算逻辑就认为该非零元素的新索引就是 (3, 1, 5, 2, 4) // 没错,这就是transpose的计算逻辑,transpose后元素在内存中的位置改变了 // 你更改的逻辑其实是 reshape的计算逻辑,reshape后所有元素在内存中的位置均不变
Delete grad kernel test case to check whether the forward kernel is correct

failed CI,@zhouwei25 之前的CI报错,我进行了一些修改,但是现在的CI报错,我有点看不懂了,能否请飞桨专家给一些方向性的意见。在python\paddle\fluid\tests\unittests\test_sparse_reshape_op.py 里面,我已经取消了backward检测,只保留了一个测试用例。


感谢解答,就是这个唯一的单侧都不能跑通,而且报错信息,我有点看不懂,尤其是下面这两个报错信息:


我先按照你的意见修改完后,再跑一遍CI,看是否还是报这两个错误
根据飞桨专家的意见修改后,还是有一样的错误,如下:

在nvidia官网上搜索得到如下解释:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html
猜测可能的报错原因:forward coo cuda kernel 中 用int64_t * 指针给indices的 DenseTensor的元素赋值时,超出范围???
似乎是 sp_out.to_dense().numpy()
导致了报错,而 sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)
似乎能计算,测试一下

How to fix it ?

The Chinese comment may be the cause of the error. So I translate the Chinese comment into English
https://www.cnblogs.com/VVingerfly/p/13751289.html

but in aistudio, dense tensor has no problems, so maybe the cause is that paddle.reshape and paddle.incubate.sparse.reshape are operating on the same paddle tensor.

How to fix it ? my solution is to use numpy to generate equal but different paddle tensor: paddle.to_tensor(np_x)

You need to add if paddle.is_compiled_with_cuda():

It may seem that numpy() method of a cuda dense tensor reports the error, but I found that actually sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)
causes the error in which sp_x
is a cuda sparse tensor, you should check your gpu forward coo kernel of sparse reshape

# dense_x = paddle.clone(origin_x.detach())
mask = np.random.randint(0, 2, x_shape)
np_x = np.random.randint(-100, 100, x_shape) * mask
### cpu version
dense_x = paddle.to_tensor(np_x, place=paddle.CPUPlace())
dense_x.numpy()
print(dense_x.numpy())
dense_x.stop_gradient = False
dense_x.numpy()
# dense_out = paddle.transpose(dense_x, dims)
dense_out = paddle.reshape(dense_x, new_shape)
dense_out.numpy()
print(dense_out.numpy())
if format == "coo":
# sp_x = origin_x.detach().to_sparse_coo(len(x_shape))
sp_x = paddle.to_tensor(np_x, place=paddle.CPUPlace()).to_sparse_coo(len(x_shape))
else:
# sp_x = origin_x.detach().to_sparse_csr()
sp_x = paddle.to_tensor(np_x, place=paddle.CPUPlace()).to_sparse_csr()
sp_x.stop_gradient = False
# sp_out = paddle.incubate.sparse.transpose(sp_x, dims)
sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)
print(10*'=', "OccupyMars2025 the following is dense_out", 10*'=')
print("dense_out.numpy():", dense_out.numpy())
print("dense_out:", dense_out)
print(10*'=', "OccupyMars2025 the following is sp_out", 10*'=')
print("sp_out:", sp_out)
print("sp_out.to_dense():", sp_out.to_dense())
print("sp_out.to_dense().numpy():", sp_out.to_dense().numpy())
print(10*'=', "OccupyMars2025 the end", 10*'=')
np.testing.assert_allclose(sp_out.to_dense().numpy(),
dense_out.numpy(),
rtol=1e-05)
if paddle.is_compiled_with_cuda():
## cuda version
dense_x = paddle.to_tensor(np_x, place=paddle.CUDAPlace(0))
dense_x.numpy()
print(dense_x.numpy())
dense_x.stop_gradient = False
dense_x.numpy()
# dense_out = paddle.transpose(dense_x, dims)
dense_out = paddle.reshape(dense_x, new_shape)
dense_out.numpy()
print(dense_out.numpy())
if format == "coo":
# sp_x = origin_x.detach().to_sparse_coo(len(x_shape))
sp_x = paddle.to_tensor(np_x, place=paddle.CUDAPlace(0)).to_sparse_coo(len(x_shape))
else:
# sp_x = origin_x.detach().to_sparse_csr()
sp_x = paddle.to_tensor(np_x, place=paddle.CUDAPlace(0)).to_sparse_csr()
sp_x.stop_gradient = False
# sp_out = paddle.incubate.sparse.transpose(sp_x, dims)
sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)
print(10*'=', "OccupyMars2025 the following is dense_out", 10*'=')
print("dense_out.numpy():", dense_out.numpy()) #report error at this line
The following picture shows that at least the forward coo kernel of sparse reshape on cpu can works in a right way.

this is the reason for dense_x.grad.numpy() * mask

dense_out.backward()
sp_out.backward()
np.testing.assert_allclose(sp_x.grad.to_dense().numpy(),
dense_x.grad.numpy() * mask,
# dense_x.grad.numpy(),
rtol=1e-05)
There seems to be numerical unstability when doing backward computation on cpu. Run the test case multiple times. Then sometimes the two grad tensors have same values and sometimes the two grad tensors have different values.


All test cases for cpu forward coo kernel are successful, so my forward computation logic is correct, but gpu forward coo kernel doesn't work.
飞桨高可复用算子库 PHI 设计文档 提到的需要注意的要点:
判断是否要进行跨设备数据拷贝
按训练和推理场景拆分编译 例如:推理不编译反向相关 kernel,也不编译带有 Intermediate 输出的前向 kernel
长线上支持跨设备 kernel 的写法统一需求,并且直观易用,不引入不必要的模板参数
解释:算子库下层还有 Kernel Primitive API 模块,其长线愿景是每个运算,只用一个 kernel,就能够适应多种设备,真正区分设备的代码,仅在 Kernel Primitive API 实现中;未来复用 kernel 传入较复杂的模板参数时,需要限制参数尽可能地简洁
- For Tensor, ALL represents an illegal Backend, but for Kernel, some
- kernels may be device-independent by nature, such as reshape;
- and some kernels are also device-independent when implemented based on
- primitive API.
这里什么不使用原先 fluid 的 VarType? 理由 1:原先 fluid 的 DataType 和 VarType 是同级概念,设计是比较混乱的,例如 LoDTensor 和 FLOAT32 是同级概念,但这两者显然不是的,我们不希望继承原先有明显缺陷的设计 理由 2:和 fluid 解耦依赖,便于后续 PHI 可以独立编译
Scalar (标量) 用来统一表示具有不同基础数据类型(float, double, int, bool 等)的变量。(目前也支持表示元素数量为 1 的 Tensor 标量,但后续可能会放弃该功能的支持)
以ScaleKernel为例,其中的scale参数可以传入 int,float,double 等普通数据类型。如果不使用Scalar来表示的话,需要为每种数据类型单独创建一个函数接口,这样会大大增加开发 Kernel 的代码量,因此Scalar主要应用在具有不同数据类型的同一参数上,可以避免该场景下需要编写多个重载函数的问题。
当一个 Tensor 赋值给另一个 Tensor,或者 Tensor 作为函数返回值时,实际上只会拷贝指针,不会产生真实的数据拷贝
编译解耦:
这里带有的 autograd 信息,只是一个指针索引,默认为空 std::unique_ptr<AbstractAutogradMeta> autograd_meta_ = nullptr; 而这里的 AbstractAutogradMeta 是一个抽象类接口,不会依赖 autograd 的任何模块,因此不会影响 PHI 的独立编译,同时又兼顾了动态图 Tensor 需要持有反向信息的需求。 这里的 AutogradMeta 仅在动态图场景中才会设置,不需要的场景,比如静态图内就仅仅是个空指针而已
DenseTensor 对应原 fluid 内的 LoDTensor 类,是 Tensor 的基础实现,DenseTensor 内的 DenseTensorMeta 包含描述 Tensor 信息的基础成员,DenseTensor 内的 Allocation 就是 fluid 原有的 Allocation
- 编译问题,按照我的理解,我应该是解决了
- 我再研究一下CUDA kernel的问题
Maybe the error "illegal memory access" is caused by "not enough memory is allocated", so I change the configuration of the cuda kernel, change "0" to a large value "1024", but this is probably not the actual cause of the error.

[Bug from the PaddlePaddle framework] When I run the following code, I will get the following error message
import paddle
sp_x = paddle.to_tensor([3], place=paddle.CUDAPlace(0)).to_sparse_coo(1)
sp_x.numpy() # this line reports an error
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
[TimeInfo: *** Aborted at 1664432194 (unix time) try "date -d @1664432194" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x0) received by PID 11542 (TID 0x7f1f16333740) from PID 0 ***]
Segmentation fault (core dumped)
aistudio@jupyter-242560-4589773:~/work$
I install the PaddlePaddle wheel package which is built from the branch OccupyMars2025:hackathon-3rd-task22-add-paddle.incubate.sparse.reshape-version002, but because the reported error has nothing to do with my newly added code, so I think maybe there is some bug in the PaddlePaddle framework itself. I use the following environment to compile the source code. And I use this command to compile: cmake .. -DPY_VERSION=3.8 -DWITH_GPU=ON

I'm rebuilding from the source code on the branch develop to check if the bug is actually from the PaddlePaddle framework itself. (2022/9/29 15:50). At 2022/9/29 17:06, I finished building from the source code on the branch "develop", and the above code reports the same error , so I'm certain that there is a bug in the source code on the branch "develop" which had been the cause of my failed CI.
I'm wrong!!!! The code sp_x.numpy()
is illegal itself. I should use sp_x.to_dense().numpy()
>>> import paddle
>>> sp_x = paddle.to_tensor([3], place=paddle.CUDAPlace(0)).to_sparse_coo(1)
W0929 09:40:46.203522 31427 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0929 09:40:46.206414 31427 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1.
>>> sp_x.to_dense()
Tensor(shape=[1], dtype=int64, place=Place(gpu:0), stop_gradient=True,
[3])
>>> sp_x.to_dense().numpy()
array([3])
>>>
Now, I'm certain that cpu coo forward kernel is ok. But there are two possible errors:
- cuda coo forward kernel
- cpu coo backward kernel
2022-09-29 15:52:35 [1, 0, 1, 1, 0]])
2022-09-29 15:52:35 y: array([[0, 1, 1, 1, 0],
2022-09-29 15:52:35 [1, 0, 1, 1, 0]])
2022-09-29 15:52:35 ----------------------------------------------------------------------
2022-09-29 15:52:35 Ran 1 test in 0.005s
2022-09-29 15:52:35 FAILED (failures=1)
2022-09-29 15:52:35 [[ 0 -52 -28 0 0]
2022-09-29 15:52:35 [ 63 0 89 -93 0]]
2022-09-29 15:52:35 [ 0 -52 -28 0 0 63 0 89 -93 0]
2022-09-29 15:52:35 sp_x.grad.to_dense().numpy(): [[0 1 1 0 0]
2022-09-29 15:52:35 [1 0 1 1 0]]
2022-09-29 15:52:35 dense_x.grad.numpy(): [[1 1 1 1 1]
2022-09-29 15:52:35 [1 1 1 1 1]]
2022-09-29 15:52:35 mask: [[0 1 1 1 0]
2022-09-29 15:52:35 [1 0 1 1 0]]
2022-09-29 15:52:35 0% tests passed, 1 tests failed out of 1
2022-09-29 15:52:35 Total Test time (real) = 3.90 sec
2022-09-29 15:52:35 The following tests FAILED:
2022-09-29 15:52:35 1292 - test_sparse_reshape_op (Failed)
2022-09-29 15:52:35 Errors while running CTest
2022-09-29 15:52:35 ========================================
2022-09-29 15:52:35 Added UT should pass three additional executions
很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。 Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.