oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

oneflow.torch.fill_()与oneflow.torch.diagonal()结合使用时结果错误

Open lihuizhao opened this issue 2 years ago • 4 comments

Summary

oneflow.torch.fill_()与oneflow.torch.diagonal()结合使用时结果错误

Code to reproduce bug

import torch as torch_original
import oneflow as flow

print("oneflow test:")
inputs = flow.rand(2, 2)
print("inputs = ",inputs)
inputs.diagonal(0).fill_(1)
print(inputs)

print("torch test:")
inputs = torch_original.rand(2, 2)
print("inputs = ",inputs)
inputs.diagonal(0).fill_(1)
print(inputs)

System Information

  • What is your OneFlow installation (pip, source, dockerhub): source
  • OS: Ubuntu 18.04.6 LTS
  • OneFlow version : clone from github (2023.12.28)
  • Python version: Python 3.8.10
  • CUDA driver version: 535.104.05
  • GPU models: NVIDIA GeForce RTX 3090
  • Other info:

run result

oneflow test:
inputs =  tensor([[0.9022, 0.7608],
        [0.3944, 0.6183]], dtype=oneflow.float32)
tensor([[1.0000, 1.0000],
        [0.3944, 0.6183]], dtype=oneflow.float32)
torch test:
inputs =  tensor([[0.7947, 0.1834],
        [0.9627, 0.7034]])
tensor([[1.0000, 0.1834],
        [0.9627, 1.0000]])

lihuizhao avatar Jan 04 '24 03:01 lihuizhao

原因是 torch 的 diagonal 操作是个 view op:

import torch as torch_original

print("torch test:")
inputs = torch_original.rand(3, 3)
y = inputs.diagonal(0)

print(y.is_contiguous())  # False
print(y.shape)  # torch.Size([3])
print(y.stride()  # (4,)

更多维度的时候只需要让 [:-2] 维度上的 stride 值为1就可以。

wyg1997 avatar Jan 04 '24 03:01 wyg1997

oneflow里有没有对应的view op?

lihuizhao avatar Jan 04 '24 04:01 lihuizhao

原因是 torch 的 diagonal 操作是个 view op:

import torch as torch_original

print("torch test:")
inputs = torch_original.rand(3, 3)
y = inputs.diagonal(0)

print(y.is_contiguous())  # False
print(y.shape)  # torch.Size([3])
print(y.stride()  # (4,)

更多维度的时候只需要让 [:-2] 维度上的 stride 值为1就可以。

fill_ 是 SupportNonContiguous 的吧,这里 oneflow_inputs.diagonal(0) 返回一个 non contiguous 的 tensor,按理说 fill_ 不应该出错?

marigoold avatar Jan 04 '24 06:01 marigoold

fill_ 是 SupportNonContiguous 的吧,这里 oneflow_inputs.diagonal(0) 返回一个 non contiguous 的 tensor,按理说 fill_ 不应该出错?

哎是的,我之前没编译以为 diagnonal 不是 view 操作。 应该是 fill_ 本身的问题,看 kernel 是不支持非连续输入的,但注册的地方有 SupportNonContiguous ,和其他的 view op一起使用一样会计算出错。

wyg1997 avatar Jan 04 '24 09:01 wyg1997