oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

support non contiguous inplace

Open hjchen2 opened this issue 2 years ago • 4 comments

  • 系统解决目前不支持非contiguous的算子的非contiguous输入inplace计算的问题

比如add不支持non-contiguous,那么对于non-contiguous的输入tensor是不允许使用inplace操作的,比如a += 1。但该PR会自动将a += 1重写为b = a + 1; a = b

Fixes OneFlow-Inc/OneTeam#1621 Fixes OneFlow-Inc/OneTeam#1627

hjchen2 avatar Aug 08 '22 03:08 hjchen2

CI failed when running job: cpu-module. PR label automerge has been removed

github-actions[bot] avatar Aug 10 '22 12:08 github-actions[bot]

Speed stats:

github-actions[bot] avatar Aug 10 '22 12:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.4ms (= 12840.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.3ms (= 14128.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.3ms / 128.4ms)

OneFlow resnet50 time: 75.5ms (= 7548.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.0ms (= 8703.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.15 (= 87.0ms / 75.5ms)

OneFlow resnet50 time: 48.7ms (= 9748.4ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.3ms (= 11461.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.18 (= 57.3ms / 48.7ms)

OneFlow resnet50 time: 36.1ms (= 7213.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.5ms (= 8296.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.15 (= 41.5ms / 36.1ms)

OneFlow resnet50 time: 28.3ms (= 5669.1ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.7ms (= 7749.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.37 (= 38.7ms / 28.3ms)

OneFlow swin dataloader time: 0.268s (= 53.507s / 200, num_workers=1)
PyTorch swin dataloader time: 0.159s (= 31.741s / 200, num_workers=1)
Relative speed: 0.593 (= 0.159s / 0.268s)

OneFlow swin dataloader time: 0.105s (= 20.959s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.235s / 200, num_workers=4)
Relative speed: 0.393 (= 0.041s / 0.105s)

OneFlow swin dataloader time: 0.060s (= 11.973s / 200, num_workers=8)
PyTorch swin dataloader time: 0.021s (= 4.238s / 200, num_workers=8)
Relative speed: 0.354 (= 0.021s / 0.060s)

❌ OneFlow resnet50 time: 137.0ms (= 13698.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.6ms (= 16063.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 160.6ms / 137.0ms)

OneFlow resnet50 time: 84.4ms (= 8437.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.1ms (= 10209.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.1ms / 84.4ms)

OneFlow resnet50 time: 57.8ms (= 11567.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.5ms (= 16108.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.39 (= 80.5ms / 57.8ms)

OneFlow resnet50 time: 45.0ms (= 8995.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.1ms (= 14217.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 71.1ms / 45.0ms)

OneFlow resnet50 time: 39.0ms (= 7808.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13584.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 67.9ms / 39.0ms)

github-actions[bot] avatar Aug 11 '22 09:08 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions[bot] avatar Aug 11 '22 10:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8867/

github-actions[bot] avatar Aug 11 '22 14:08 github-actions[bot]