oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Performant RMSNorm

Open leaves-zwx opened this issue 3 years ago • 1 comments

leaves-zwx avatar Nov 07 '22 05:11 leaves-zwx

Speed stats:

github-actions[bot] avatar Nov 16 '22 16:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9380/

github-actions[bot] avatar Nov 17 '22 03:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.6ms (= 13964.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.0ms (= 16295.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 163.0ms / 139.6ms)

OneFlow resnet50 time: 85.4ms (= 8541.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.6ms (= 10261.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.6ms / 85.4ms)

OneFlow resnet50 time: 57.4ms (= 11476.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.6ms (= 15725.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 78.6ms / 57.4ms)

OneFlow resnet50 time: 46.3ms (= 9258.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.1ms (= 13413.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 67.1ms / 46.3ms)

OneFlow resnet50 time: 41.0ms (= 8200.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.4ms (= 13682.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 68.4ms / 41.0ms)

github-actions[bot] avatar Nov 17 '22 03:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 17 '22 15:11 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Nov 17 '22 17:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 17 '22 17:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.2ms (= 14119.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.8ms (= 16277.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 162.8ms / 141.2ms)

OneFlow resnet50 time: 86.8ms (= 8684.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.2ms (= 11217.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.29 (= 112.2ms / 86.8ms)

OneFlow resnet50 time: 58.2ms (= 11643.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.4ms (= 17681.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 88.4ms / 58.2ms)

OneFlow resnet50 time: 44.2ms (= 8844.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.3ms (= 14061.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 70.3ms / 44.2ms)

OneFlow resnet50 time: 39.2ms (= 7847.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.3ms (= 13463.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 67.3ms / 39.2ms)

github-actions[bot] avatar Nov 18 '22 09:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9380/

github-actions[bot] avatar Nov 18 '22 09:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 18 '22 13:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.1ms (= 14010.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.4ms (= 16335.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 163.4ms / 140.1ms)

OneFlow resnet50 time: 86.2ms (= 8615.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.9ms (= 10191.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 101.9ms / 86.2ms)

OneFlow resnet50 time: 57.8ms (= 11565.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15543.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 77.7ms / 57.8ms)

OneFlow resnet50 time: 44.1ms (= 8816.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.2ms (= 13638.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 68.2ms / 44.1ms)

OneFlow resnet50 time: 40.3ms (= 8064.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.4ms (= 13889.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 69.4ms / 40.3ms)

github-actions[bot] avatar Nov 18 '22 14:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9380/

github-actions[bot] avatar Nov 18 '22 14:11 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Nov 18 '22 15:11 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Nov 18 '22 16:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.4ms (= 14042.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.7ms (= 16171.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 161.7ms / 140.4ms)

OneFlow resnet50 time: 86.3ms (= 8626.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.7ms (= 10272.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 102.7ms / 86.3ms)

OneFlow resnet50 time: 58.2ms (= 11641.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.1ms (= 15621.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 78.1ms / 58.2ms)

OneFlow resnet50 time: 45.3ms (= 9058.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.1ms (= 14419.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 72.1ms / 45.3ms)

OneFlow resnet50 time: 39.2ms (= 7839.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.5ms (= 13903.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.77 (= 69.5ms / 39.2ms)

github-actions[bot] avatar Nov 19 '22 02:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9380/

github-actions[bot] avatar Nov 19 '22 02:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.1ms (= 14114.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.5ms (= 16251.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 162.5ms / 141.1ms)

OneFlow resnet50 time: 85.5ms (= 8552.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.1ms (= 10307.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 103.1ms / 85.5ms)

OneFlow resnet50 time: 58.2ms (= 11640.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15547.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 77.7ms / 58.2ms)

OneFlow resnet50 time: 44.7ms (= 8949.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 82.6ms (= 16517.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.85 (= 82.6ms / 44.7ms)

OneFlow resnet50 time: 40.5ms (= 8108.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.2ms (= 13836.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.71 (= 69.2ms / 40.5ms)

github-actions[bot] avatar Nov 19 '22 06:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9380/

github-actions[bot] avatar Nov 19 '22 06:11 github-actions[bot]