OccupyMars2025
OccupyMars2025
## 似乎是 `sp_out.to_dense().numpy()` 导致了报错,而 `sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape) `似乎能计算,测试一下
## How to fix it ? ## The Chinese comment may be the cause of the error. So I translate the Chinese comment into English https://www.cnblogs.com/VVingerfly/p/13751289.html
## but in aistudio, dense tensor has no problems, so maybe the cause is that paddle.reshape and paddle.incubate.sparse.reshape are operating on the same paddle tensor. ## How to fix it...
## You need to add `if paddle.is_compiled_with_cuda():`
# It may seem that numpy() method of a cuda dense tensor reports the error, but I found that actually `sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)` causes the error in which `sp_x`...
# The following picture shows that at least the forward coo kernel of sparse reshape on cpu can works in a right way.
## this is the reason for ` dense_x.grad.numpy() * mask` ```python dense_out.backward() sp_out.backward() np.testing.assert_allclose(sp_x.grad.to_dense().numpy(), dense_x.grad.numpy() * mask, # dense_x.grad.numpy(), rtol=1e-05) ```
## There seems to be numerical unstability when doing backward computation on cpu. Run the test case multiple times. Then sometimes the two grad tensors have same values and sometimes...
## All test cases for cpu forward coo kernel are successful, so my forward computation logic is correct, but gpu forward coo kernel doesn't work.
## [飞桨高可复用算子库 PHI 设计文档](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/phi/design_cn.md) 提到的需要注意的要点: 判断是否要进行跨设备数据拷贝 按训练和推理场景拆分编译 例如:推理不编译反向相关 kernel,也不编译带有 Intermediate 输出的前向 kernel 长线上支持跨设备 kernel 的写法统一需求,并且直观易用,不引入不必要的模板参数 解释:算子库下层还有 Kernel Primitive API 模块,其长线愿景是每个运算,只用一个 kernel,就能够适应多种设备,真正区分设备的代码,仅在 Kernel Primitive API 实现中;未来复用 kernel 传入较复杂的模板参数时,需要限制参数尽可能地简洁 * For Tensor,...