oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Fix bug when autograd.grad meet tensor.grad is not None

Open wyg1997 opened this issue 2 years ago • 7 comments

fix #9390

使 tensor.retain_grad 和 autograd.grad 解耦,同时重新考虑 GraphTask 内的数据结构,减小查表次数

wyg1997 avatar Nov 09 '22 11:11 wyg1997

建议为修复的问题增加一个针对性的测试用例

levi131 avatar Nov 10 '22 03:11 levi131

建议为修复的问题增加一个针对性的测试用例

done

wyg1997 avatar Nov 10 '22 07:11 wyg1997

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Nov 10 '22 20:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 10 '22 20:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 11 '22 11:11 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Nov 11 '22 16:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 11 '22 16:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 13 '22 12:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.0ms (= 13899.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.1ms (= 16008.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.1ms / 139.0ms)

OneFlow resnet50 time: 84.7ms (= 8470.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.6ms (= 10261.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.6ms / 84.7ms)

OneFlow resnet50 time: 57.5ms (= 11497.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.3ms (= 15650.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.3ms / 57.5ms)

OneFlow resnet50 time: 44.6ms (= 8921.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.1ms (= 14029.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.57 (= 70.1ms / 44.6ms)

OneFlow resnet50 time: 39.9ms (= 7987.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.5ms (= 13299.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 66.5ms / 39.9ms)

github-actions[bot] avatar Nov 13 '22 15:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9402/

github-actions[bot] avatar Nov 13 '22 15:11 github-actions[bot]