oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Impl of fused_fast_gelu_mul

Open leaves-zwx opened this issue 2 years ago • 6 comments

leaves-zwx avatar Nov 08 '22 15:11 leaves-zwx

这里修复的正确性问题是指哪个呢?

chengtbf avatar Nov 09 '22 03:11 chengtbf

这里修复的正确性问题是指哪个呢?

反向的计算

leaves-zwx avatar Nov 09 '22 03:11 leaves-zwx

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.5ms (= 13945.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.8ms (= 15979.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 159.8ms / 139.5ms)

OneFlow resnet50 time: 85.1ms (= 8508.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 109.6ms (= 10962.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.29 (= 109.6ms / 85.1ms)

OneFlow resnet50 time: 57.9ms (= 11570.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.0ms (= 15591.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.0ms / 57.9ms)

OneFlow resnet50 time: 44.0ms (= 8803.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.1ms (= 14022.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 70.1ms / 44.0ms)

OneFlow resnet50 time: 40.8ms (= 8160.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.5ms (= 14704.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.80 (= 73.5ms / 40.8ms)

github-actions[bot] avatar Nov 11 '22 18:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9397/

github-actions[bot] avatar Nov 11 '22 18:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.9ms (= 13990.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.7ms (= 16165.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 161.7ms / 139.9ms)

OneFlow resnet50 time: 85.2ms (= 8522.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 100.6ms (= 10059.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 100.6ms / 85.2ms)

OneFlow resnet50 time: 57.5ms (= 11508.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.3ms (= 17457.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 87.3ms / 57.5ms)

OneFlow resnet50 time: 45.7ms (= 9149.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.3ms (= 15864.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.73 (= 79.3ms / 45.7ms)

OneFlow resnet50 time: 40.9ms (= 8170.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.8ms (= 15764.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.93 (= 78.8ms / 40.9ms)

github-actions[bot] avatar Nov 12 '22 03:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9397/

github-actions[bot] avatar Nov 12 '22 03:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 14 '22 11:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.2ms (= 13921.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.2ms (= 16417.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 164.2ms / 139.2ms)

OneFlow resnet50 time: 84.8ms (= 8479.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.7ms (= 10172.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 101.7ms / 84.8ms)

OneFlow resnet50 time: 57.3ms (= 11452.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.3ms (= 17450.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 87.3ms / 57.3ms)

OneFlow resnet50 time: 44.0ms (= 8792.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.7ms (= 14140.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 70.7ms / 44.0ms)

OneFlow resnet50 time: 40.4ms (= 8087.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.6ms (= 15115.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.87 (= 75.6ms / 40.4ms)

github-actions[bot] avatar Nov 14 '22 14:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9397/

github-actions[bot] avatar Nov 14 '22 15:11 github-actions[bot]