oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Fix set_acc_grad and backward wrong value

Open wyg1997 opened this issue 2 years ago • 13 comments

修复 set_acc_grad 后,再 backward 梯度没有正确累加的问题。

#  import oneflow as flow
import torch as flow

value = flow.tensor([[-0.0875, -0.4890,  0.9031],
            [ 0.4930, -0.6041, -1.5392]]).requires_grad_()

value.grad = flow.randn((2, 3))

output = flow.sum(value)
output.backward()
print(value.grad)

后面考虑把 AccumulateNode 改成动态绑定的形式,相关的接口就会简单很多。

wyg1997 avatar Jul 06 '22 01:07 wyg1997

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Jul 06 '22 03:07 github-actions[bot]

Speed stats:

github-actions[bot] avatar Jul 06 '22 17:07 github-actions[bot]

CI failed when running job: cpu-module. PR label automerge has been removed

github-actions[bot] avatar Jul 06 '22 18:07 github-actions[bot]

Speed stats:

github-actions[bot] avatar Jul 06 '22 18:07 github-actions[bot]

CI failed when running job: cpu-module. PR label automerge has been removed

github-actions[bot] avatar Jul 06 '22 20:07 github-actions[bot]

CI failed when running job: cpu-module. PR label automerge has been removed

github-actions[bot] avatar Jul 07 '22 01:07 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Jul 07 '22 01:07 github-actions[bot]

Speed stats:

github-actions[bot] avatar Jul 07 '22 01:07 github-actions[bot]

Speed stats:

github-actions[bot] avatar Aug 11 '22 04:08 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Aug 11 '22 08:08 github-actions[bot]

python/oneflow/test/modules/test_global_nms.py:56

看起来是 nms 算法中小概率下的数据精度问题,本地 4 卡 8 卡均未复现,重跑一下。

wyg1997 avatar Aug 11 '22 10:08 wyg1997

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8575/

github-actions[bot] avatar Aug 11 '22 10:08 github-actions[bot]

CI failed when running job: cuda-speed-test. PR label automerge has been removed

github-actions[bot] avatar Aug 11 '22 10:08 github-actions[bot]