oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Fix BroadcastElemenetwiseUnary launch error

Open MARD1NO opened this issue 1 year ago • 12 comments

这里的dispatch pack 逻辑只是根据指针地址来做

考虑一个情况:

比如你只有2个float元素,此时你pack是4个float来pack。那么pack_count = 2 / 4 = 0,你启动block数量=0就失败了。

因此dispatch逻辑需要考虑到count是否大于等于当前pack_count。还是2个float元素,他最终会走到分支 pack_size = 2

MARD1NO avatar Sep 01 '22 03:09 MARD1NO

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9036/

github-actions[bot] avatar Sep 01 '22 08:09 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.1ms (= 12909.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.2ms (= 14124.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 141.2ms / 129.1ms)

OneFlow resnet50 time: 74.3ms (= 7431.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.6ms (= 8455.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 84.6ms / 74.3ms)

OneFlow resnet50 time: 46.5ms (= 9303.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 60.3ms (= 12052.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.30 (= 60.3ms / 46.5ms)

OneFlow resnet50 time: 33.9ms (= 6779.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 50.4ms (= 10081.0ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.49 (= 50.4ms / 33.9ms)

OneFlow resnet50 time: 28.0ms (= 5592.5ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 42.4ms (= 8470.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.51 (= 42.4ms / 28.0ms)

OneFlow swin dataloader time: 0.404s (= 80.802s / 200, num_workers=1)
PyTorch swin dataloader time: 0.154s (= 30.782s / 200, num_workers=1)
Relative speed: 0.381 (= 0.154s / 0.404s)

OneFlow swin dataloader time: 0.108s (= 21.667s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.115s / 200, num_workers=4)
Relative speed: 0.375 (= 0.041s / 0.108s)

OneFlow swin dataloader time: 0.039s (= 7.855s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.419s / 200, num_workers=8)
Relative speed: 0.563 (= 0.022s / 0.039s)

❌ OneFlow resnet50 time: 138.0ms (= 13795.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.7ms (= 16265.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 162.7ms / 138.0ms)

OneFlow resnet50 time: 83.8ms (= 8380.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.6ms (= 11264.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 112.6ms / 83.8ms)

OneFlow resnet50 time: 56.5ms (= 11293.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.1ms (= 15614.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.38 (= 78.1ms / 56.5ms)

OneFlow resnet50 time: 44.0ms (= 8794.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.3ms (= 15451.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 77.3ms / 44.0ms)

OneFlow resnet50 time: 38.4ms (= 7688.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13587.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.77 (= 67.9ms / 38.4ms)

github-actions[bot] avatar Sep 01 '22 08:09 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Sep 01 '22 08:09 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9036/

github-actions[bot] avatar Sep 03 '22 11:09 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.1ms (= 12912.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.9ms (= 14193.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.9ms / 129.1ms)

OneFlow resnet50 time: 74.3ms (= 7431.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.3ms (= 8428.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 84.3ms / 74.3ms)

OneFlow resnet50 time: 46.6ms (= 9313.4ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.8ms (= 12557.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.35 (= 62.8ms / 46.6ms)

OneFlow resnet50 time: 34.0ms (= 6795.5ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.9ms (= 8580.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.26 (= 42.9ms / 34.0ms)

OneFlow resnet50 time: 28.1ms (= 5621.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.2ms (= 7441.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.32 (= 37.2ms / 28.1ms)

OneFlow swin dataloader time: 0.414s (= 82.857s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.251s / 200, num_workers=1)
Relative speed: 0.365 (= 0.151s / 0.414s)

OneFlow swin dataloader time: 0.069s (= 13.773s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.420s / 200, num_workers=4)
Relative speed: 0.611 (= 0.042s / 0.069s)

OneFlow swin dataloader time: 0.041s (= 8.196s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.449s / 200, num_workers=8)
Relative speed: 0.543 (= 0.022s / 0.041s)

❌ OneFlow resnet50 time: 138.3ms (= 13834.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.2ms (= 16016.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 160.2ms / 138.3ms)

OneFlow resnet50 time: 84.1ms (= 8407.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.7ms (= 10466.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 104.7ms / 84.1ms)

OneFlow resnet50 time: 56.8ms (= 11351.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.0ms (= 15599.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 78.0ms / 56.8ms)

OneFlow resnet50 time: 44.1ms (= 8823.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.7ms (= 14340.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.63 (= 71.7ms / 44.1ms)

OneFlow resnet50 time: 38.4ms (= 7685.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.6ms (= 15111.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.97 (= 75.6ms / 38.4ms)

github-actions[bot] avatar Sep 03 '22 11:09 github-actions[bot]

CI failed when running job: cpu-misc. PR label automerge has been removed

github-actions[bot] avatar Sep 03 '22 11:09 github-actions[bot]

Speed stats:

github-actions[bot] avatar Sep 03 '22 11:09 github-actions[bot]

不连续的情况下,用pack需要保证连续。但是这里判断是否连续比较麻烦,后续用unroll展开处理

MARD1NO avatar Sep 05 '22 06:09 MARD1NO

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Sep 05 '22 10:09 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.2ms (= 12919.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.5ms (= 14349.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 143.5ms / 129.2ms)

OneFlow resnet50 time: 74.4ms (= 7441.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.2ms (= 8422.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 84.2ms / 74.4ms)

OneFlow resnet50 time: 46.9ms (= 9388.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 61.9ms (= 12384.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.32 (= 61.9ms / 46.9ms)

OneFlow resnet50 time: 34.1ms (= 6829.9ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.0ms (= 8607.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.26 (= 43.0ms / 34.1ms)

OneFlow resnet50 time: 28.2ms (= 5641.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.7ms (= 7743.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.37 (= 38.7ms / 28.2ms)

OneFlow swin dataloader time: 0.259s (= 51.895s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.895s / 200, num_workers=1)
Relative speed: 0.576 (= 0.149s / 0.259s)

OneFlow swin dataloader time: 0.069s (= 13.886s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.356s / 200, num_workers=4)
Relative speed: 0.602 (= 0.042s / 0.069s)

OneFlow swin dataloader time: 0.040s (= 8.061s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.367s / 200, num_workers=8)
Relative speed: 0.542 (= 0.022s / 0.040s)

❌ OneFlow resnet50 time: 138.1ms (= 13814.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.0ms (= 16097.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 161.0ms / 138.1ms)

OneFlow resnet50 time: 84.4ms (= 8443.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10226.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.3ms / 84.4ms)

OneFlow resnet50 time: 57.0ms (= 11401.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.4ms (= 16073.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.41 (= 80.4ms / 57.0ms)

OneFlow resnet50 time: 44.4ms (= 8883.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.4ms (= 14286.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 71.4ms / 44.4ms)

OneFlow resnet50 time: 38.6ms (= 7713.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.5ms (= 15492.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 2.01 (= 77.5ms / 38.6ms)

github-actions[bot] avatar Sep 06 '22 04:09 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9036/

github-actions[bot] avatar Sep 06 '22 04:09 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Sep 19 '22 02:09 github-actions[bot]