oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

[inplace相关] += 和clamp_在处理切片矩阵时与torch结果不一致

Open HiHippie opened this issue 3 years ago • 4 comments

如题

最小复现代码:

>>> oneflow.__version__
'0.8.0.dev20220705+cu112'
import torch
import oneflow as flow

x_torch = torch.randn(5,5)
x_flow = flow.tensor(x_torch.numpy())
# BUG: inplace
x_torch[:,:2] += x_torch[:,4:]
x_flow[:,:2] += x_flow[:,4:]
# False
print((x_torch.numpy()==x_flow.numpy()).all())


x_torch = torch.randn(5,5)
x_flow = flow.tensor(x_torch.numpy())
x_torch += x_torch
x_flow += x_flow
# True
print((x_torch.numpy()==x_flow.numpy()).all())



x_torch = torch.randn(5,5)
x_flow = flow.tensor(x_torch.numpy())
x_torch[:,:2] = x_torch[:,:2] + x_torch[:,4:]
x_flow[:,:2] = x_flow[:,:2] + x_flow[:,4:]
# True
print((x_torch.numpy()==x_flow.numpy()).all())


x_torch = torch.randn(5,5)
x_flow = flow.tensor(x_torch.numpy())
x_torch.clamp_(min=0, max=1)
x_flow.clamp_(min=0, max=1)
# True
print((x_torch.numpy()==x_flow.numpy()).all())


x_torch = torch.randn(5,5)
x_flow = flow.tensor(x_torch.numpy())
# BUG: inplace
x_torch[:, :2].clamp_(min=0, max=1)
x_flow[:, :2].clamp_(min=0, max=1)
# False
print((x_torch.numpy()==x_flow.numpy()).all())


x_torch = torch.randn(5,5)
x_flow = flow.tensor(x_torch.numpy())
x_torch[:, :2].clamp(min=0, max=1)
x_flow[:, :2].clamp(min=0, max=1)
# True
print((x_torch.numpy()==x_flow.numpy()).all())

HiHippie avatar Jul 06 '22 05:07 HiHippie

初步判断是 view 定制算错 stride 的原因,关闭 view 再执行 ONEFLOW_DISABLE_VIEW=1 python test.py 结果是正确的。

@small1945 oneflow-inc/oneteam#472 可以在这个 issue 里了解一下 view 机制是什么,会有利于你查这个问题。

wyg1997 avatar Jul 12 '22 11:07 wyg1997

初步判断是 view 定制算错 stride 的原因,关闭 view 再执行 ONEFLOW_DISABLE_VIEW=1 python test.py 结果是正确的。

@small1945 Oneflow-Inc/OneTeam#472 可以在这个 issue 里了解一下 view 机制是什么,会有利于你查这个问题。

收到

small1945 avatar Jul 12 '22 11:07 small1945

x_torch = torch.arange(25).reshape(5, 5)
y_torch = torch.ones(5).reshape(1, 5).long()
print(y_torch)
x_flow = flow.tensor(x_torch.numpy())
y_flow = flow.tensor(y_torch.numpy())
x_torch[:1, :] -=y_torch
x_flow[:1, : ]-= y_flow
# True

张量在第一维做切片的时候是没问题的

x_torch = torch.arange(25).reshape(5, 5)
y_torch = torch.ones(5).reshape(1, 5).long()
print(y_torch)
x_flow = flow.tensor(x_torch.numpy())
y_flow = flow.tensor(y_torch.numpy())
x_torch[:, :1] -=y_torch
x_flow[:, :1 ]-= y_flow.contiguous()
# False

张量切片后inplace加上经过contiguous操作的张量,仍然出错

根据以上测试推测slice算子和slice_update算子暂时不存在问题

small1945 avatar Jul 22 '22 10:07 small1945

x_torch = torch.arange(25).reshape(5, 5)
y_torch = torch.ones(5).reshape(5, 1).long()
x_flow = flow.tensor(x_torch.numpy())
y_flow = flow.tensor(y_torch.numpy())
x_torch[:, 4:] += y_torch
x_flow[:, 4:]+= y_flow.contiguous()
flow.add(x_flow[:, 4:], y_flow.contiguous(), inplace=True)
print(x_torch)
print(x_flow)
print((x_torch.numpy() == x_flow.numpy()).all())
#False
  • 直接使用flow.add(x_flow[:, 4:], y_flow.contiguous(), inplace=True)得出的结果仍然不一致,排除slice_update算子的问题

  • 经排查,张量经过add inplace操作后结果就有问题,因此原因是add算子不支持非contiguous的张量操作。

  • 除了add算子外,包括mut,sub等基本算子也存在不支持非contiguous的张量的问题

small1945 avatar Jul 29 '22 05:07 small1945

#8867 中已经解决

wyg1997 avatar Aug 12 '22 02:08 wyg1997