xxxyyyzzz12345
xxxyyyzzz12345
In addition, oneflow.MinMaxObserver and oneflow.ones_like also have similar problems. By the way, is there any progress on implementing or fixing the gradient functions of these apis?
Thanks for your reply! But it seems that the derivative for floor_divide does exist when the divisor is a tensor: ``` x = oneflow.tensor([1.,2.,3.]).requires_grad_() y = oneflow.tensor([1.,1.,1.]) output = oneflow.floor_divide(x,y)...
Thanks for your reply! However, the results are still inconsistent if oneflow.sum() is not used. ``` count = 0 input = oneflow.rand(2, 72, 16,dtype=oneflow.float64).cuda() input_grad = input.clone().requires_grad_(True) for i in...
The same problem exists for oneflow.nn.init.ones_, oneflow.nn.init.zeros_.