oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Optimize UpsampleNearest2D 2X

Open liujuncheng opened this issue 3 years ago • 4 comments

针对 UpsampleNearest2D 2X 情况进行优化

  • 减少坐标换算的开销
  • 优化访存指令的数量

liujuncheng avatar Nov 12 '22 12:11 liujuncheng

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.4ms (= 13939.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.4ms (= 16044.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.4ms / 139.4ms)

OneFlow resnet50 time: 84.6ms (= 8462.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.2ms (= 10124.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 101.2ms / 84.6ms)

OneFlow resnet50 time: 57.9ms (= 11586.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.4ms (= 15684.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.4ms / 57.9ms)

OneFlow resnet50 time: 44.7ms (= 8940.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.2ms (= 13845.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 69.2ms / 44.7ms)

OneFlow resnet50 time: 39.1ms (= 7827.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13601.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 68.0ms / 39.1ms)

github-actions[bot] avatar Nov 12 '22 12:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.7ms (= 13971.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.1ms (= 16007.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.1ms / 139.7ms)

OneFlow resnet50 time: 84.8ms (= 8480.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 100.7ms (= 10073.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 100.7ms / 84.8ms)

OneFlow resnet50 time: 57.8ms (= 11565.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.3ms (= 15463.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 77.3ms / 57.8ms)

OneFlow resnet50 time: 43.9ms (= 8782.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14197.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.62 (= 71.0ms / 43.9ms)

OneFlow resnet50 time: 39.8ms (= 7967.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.5ms (= 15300.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.92 (= 76.5ms / 39.8ms)

github-actions[bot] avatar Nov 12 '22 13:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9415/

github-actions[bot] avatar Nov 12 '22 13:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9415/

github-actions[bot] avatar Nov 12 '22 14:11 github-actions[bot]

有没有测试数据或者ncu profile数据,看一下性能大概能提升多少?

Flowingsun007 avatar Nov 14 '22 01:11 Flowingsun007

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.4ms (= 13937.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.7ms (= 15971.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 159.7ms / 139.4ms)

OneFlow resnet50 time: 84.9ms (= 8488.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.6ms (= 10159.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 101.6ms / 84.9ms)

OneFlow resnet50 time: 57.7ms (= 11531.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.1ms (= 15616.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.1ms / 57.7ms)

OneFlow resnet50 time: 44.2ms (= 8840.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.5ms (= 14095.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 70.5ms / 44.2ms)

OneFlow resnet50 time: 39.3ms (= 7869.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.3ms (= 15261.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.94 (= 76.3ms / 39.3ms)

github-actions[bot] avatar Nov 14 '22 02:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9415/

github-actions[bot] avatar Nov 14 '22 02:11 github-actions[bot]