oneflow
oneflow copied to clipboard
Fix bug when autograd.grad meet tensor.grad is not None
fix #9390
使 tensor.retain_grad 和 autograd.grad 解耦,同时重新考虑 GraphTask 内的数据结构,减小查表次数
建议为修复的问题增加一个针对性的测试用例
建议为修复的问题增加一个针对性的测试用例
done
CI failed when running job: cuda-module. PR label automerge has been removed
Speed stats:
Speed stats:
CI failed when running job: cuda-module. PR label automerge has been removed
Speed stats:
Speed stats:
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.0ms (= 13899.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.1ms (= 16008.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.1ms / 139.0ms)
OneFlow resnet50 time: 84.7ms (= 8470.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.6ms (= 10261.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.6ms / 84.7ms)
OneFlow resnet50 time: 57.5ms (= 11497.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.3ms (= 15650.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.3ms / 57.5ms)
OneFlow resnet50 time: 44.6ms (= 8921.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.1ms (= 14029.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.57 (= 70.1ms / 44.6ms)
OneFlow resnet50 time: 39.9ms (= 7987.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.5ms (= 13299.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 66.5ms / 39.9ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9402/