oneflow
oneflow copied to clipboard
[BUG] got check error when use oneflow.amp.autocast
复现代码:
import oneflow as flow
x = flow.tensor([2.4,3.5], device="cuda", dtype=flow.float16)
with flow.amp.autocast("cuda", flow.float16):
y = x.clone()
y.fill_(2.36)
print(y.dtype)
上面代码pytorch不会出错,oneflow报以下错误:
Traceback (most recent call last):
File "test.py", line 7, in <module>
y.fill_(2.36)
File "/home/hanbinbin/oneflow/python/oneflow/framework/tensor.py", line 286, in _fill
return flow._C.fill_(self, value)
RuntimeError: Check failed: tensor_impl->tensor_meta()->dtype() == output_tensor_metas.at(i)->dtype()
File "/home/hanbinbin/oneflow/oneflow/core/functional/impl/array_functor.cpp", line 3337, in operator()
OpInterpUtil::Dispatch(*op_, {in}, outputs.get(), attrs)
File "/home/hanbinbin/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
internal_->Apply(op_expr, inputs, outputs, ctx)
File "/home/hanbinbin/oneflow/oneflow/core/framework/op_interpreter/eager_local_op_interpreter.cpp", line 122, in NaiveInterpret
Error Type: oneflow.ErrorProto.check_failed_error
Aborted (core dumped)
具体情况是这样的,这段代码会在fill_ op前插一个cast op,导致fill_ op输入输出的dtype检查时不一样而挂掉,因为是inplace操作,所以在推导完op的元信息后会检查推导出的结果与output的相同,因为插入了cast op导致interpreter的input变了,fill_ op会根据新的input推导,结果就和output对不上了
看起来应该是fill这个op没有加到amp list里面导致的,如果fill这个op被认为是black op,那它就会强制将输入转换成float32。
可以把fill加到clear list中,应该可以解决这里的问题。