oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Optimize fmha transpose

Open liujuncheng opened this issue 3 years ago • 2 comments

liujuncheng avatar Nov 13 '22 13:11 liujuncheng

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.4ms (= 13944.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.9ms (= 16091.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.9ms / 139.4ms)

OneFlow resnet50 time: 85.1ms (= 8507.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 100.9ms (= 10090.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 100.9ms / 85.1ms)

OneFlow resnet50 time: 57.5ms (= 11492.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.4ms (= 16081.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 80.4ms / 57.5ms)

OneFlow resnet50 time: 45.1ms (= 9011.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.2ms (= 14237.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 71.2ms / 45.1ms)

OneFlow resnet50 time: 40.7ms (= 8131.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.5ms (= 13699.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 68.5ms / 40.7ms)

github-actions[bot] avatar Nov 13 '22 14:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9417/

github-actions[bot] avatar Nov 13 '22 14:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 14 '22 23:11 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Nov 15 '22 00:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 15 '22 00:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.6ms (= 13959.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.4ms (= 16040.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.4ms / 139.6ms)

OneFlow resnet50 time: 85.0ms (= 8497.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.0ms (= 10095.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.0ms / 85.0ms)

OneFlow resnet50 time: 57.6ms (= 11522.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.2ms (= 15632.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.2ms / 57.6ms)

OneFlow resnet50 time: 44.4ms (= 8888.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.1ms (= 14221.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.60 (= 71.1ms / 44.4ms)

OneFlow resnet50 time: 40.4ms (= 8071.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13579.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 67.9ms / 40.4ms)

github-actions[bot] avatar Nov 15 '22 00:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9417/

github-actions[bot] avatar Nov 15 '22 01:11 github-actions[bot]