oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Unary math op primitive-based kernel

Open EsdeathYZH opened this issue 2 years ago • 8 comments

EsdeathYZH avatar Jul 01 '22 15:07 EsdeathYZH

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8550/

github-actions[bot] avatar Jul 07 '22 18:07 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.3ms (= 12931.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.1ms (= 14305.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 143.1ms / 129.3ms)

OneFlow resnet50 time: 75.8ms (= 7582.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.1ms (= 8610.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 86.1ms / 75.8ms)

OneFlow resnet50 time: 49.3ms (= 9864.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.3ms (= 11661.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.18 (= 58.3ms / 49.3ms)

OneFlow resnet50 time: 39.8ms (= 7966.3ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 50.9ms (= 10187.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.28 (= 50.9ms / 39.8ms)

OneFlow resnet50 time: 34.9ms (= 6989.5ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.4ms (= 7275.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.04 (= 36.4ms / 34.9ms)

OneFlow swin dataloader time: 0.290s (= 58.071s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.057s / 200, num_workers=1)
Relative speed: 0.518 (= 0.150s / 0.290s)

OneFlow swin dataloader time: 0.084s (= 16.856s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.173s / 200, num_workers=4)
Relative speed: 0.485 (= 0.041s / 0.084s)

OneFlow swin dataloader time: 0.046s (= 9.160s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.455s / 200, num_workers=8)
Relative speed: 0.486 (= 0.022s / 0.046s)

❌ OneFlow resnet50 time: 145.2ms (= 14516.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 174.7ms (= 17472.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 174.7ms / 145.2ms)

OneFlow resnet50 time: 96.0ms (= 9598.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.1ms (= 11211.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 112.1ms / 96.0ms)

OneFlow resnet50 time: 69.8ms (= 13969.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 92.3ms (= 18461.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 92.3ms / 69.8ms)

OneFlow resnet50 time: 59.6ms (= 11912.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 82.3ms (= 16463.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.38 (= 82.3ms / 59.6ms)

OneFlow resnet50 time: 53.4ms (= 10675.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.6ms (= 13924.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 69.6ms / 53.4ms)

github-actions[bot] avatar Jul 07 '22 18:07 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8550/

github-actions[bot] avatar Jul 08 '22 02:07 github-actions[bot]

Speed stats:

github-actions[bot] avatar Jul 08 '22 02:07 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8550/

github-actions[bot] avatar Jul 08 '22 05:07 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.1ms (= 12810.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.7ms (= 14274.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.7ms / 128.1ms)

OneFlow resnet50 time: 75.0ms (= 7495.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.7ms (= 8471.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 84.7ms / 75.0ms)

OneFlow resnet50 time: 47.7ms (= 9540.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.4ms (= 12478.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.31 (= 62.4ms / 47.7ms)

OneFlow resnet50 time: 39.3ms (= 7859.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.9ms (= 8986.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.14 (= 44.9ms / 39.3ms)

OneFlow resnet50 time: 33.7ms (= 6731.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.3ms (= 7250.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.08 (= 36.3ms / 33.7ms)

OneFlow swin dataloader time: 0.266s (= 53.172s / 200, num_workers=1)
PyTorch swin dataloader time: 0.148s (= 29.685s / 200, num_workers=1)
Relative speed: 0.558 (= 0.148s / 0.266s)

OneFlow swin dataloader time: 0.073s (= 14.525s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 7.979s / 200, num_workers=4)
Relative speed: 0.549 (= 0.040s / 0.073s)

OneFlow swin dataloader time: 0.041s (= 8.216s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.502s / 200, num_workers=8)
Relative speed: 0.548 (= 0.023s / 0.041s)

❌ OneFlow resnet50 time: 136.7ms (= 13668.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.9ms (= 16190.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.9ms / 136.7ms)

OneFlow resnet50 time: 85.2ms (= 8515.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.0ms (= 10201.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.0ms / 85.2ms)

OneFlow resnet50 time: 57.1ms (= 11420.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.3ms (= 15655.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 78.3ms / 57.1ms)

OneFlow resnet50 time: 47.7ms (= 9539.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.1ms (= 14619.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 73.1ms / 47.7ms)

OneFlow resnet50 time: 46.4ms (= 9287.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.8ms (= 13356.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.44 (= 66.8ms / 46.4ms)

github-actions[bot] avatar Jul 08 '22 05:07 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.3ms (= 12926.1ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.1ms (= 14313.3ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 143.1ms / 129.3ms)

OneFlow resnet50 time: 75.7ms (= 7567.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.8ms (= 8384.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.11 (= 83.8ms / 75.7ms)

OneFlow resnet50 time: 48.6ms (= 9710.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 56.7ms (= 11345.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.17 (= 56.7ms / 48.6ms)

OneFlow resnet50 time: 41.3ms (= 8261.3ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.6ms (= 8317.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.01 (= 41.6ms / 41.3ms)

OneFlow resnet50 time: 35.9ms (= 7170.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.6ms (= 7718.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.08 (= 38.6ms / 35.9ms)

OneFlow swin dataloader time: 0.257s (= 51.413s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.112s / 200, num_workers=1)
Relative speed: 0.586 (= 0.151s / 0.257s)

OneFlow swin dataloader time: 0.073s (= 14.534s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.172s / 200, num_workers=4)
Relative speed: 0.562 (= 0.041s / 0.073s)

OneFlow swin dataloader time: 0.043s (= 8.501s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.412s / 200, num_workers=8)
Relative speed: 0.519 (= 0.022s / 0.043s)

❌ OneFlow resnet50 time: 146.3ms (= 14633.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 168.3ms (= 16833.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 168.3ms / 146.3ms)

OneFlow resnet50 time: 94.6ms (= 9464.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 111.8ms (= 11181.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 111.8ms / 94.6ms)

OneFlow resnet50 time: 73.1ms (= 14627.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 89.6ms (= 17917.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 89.6ms / 73.1ms)

OneFlow resnet50 time: 57.3ms (= 11451.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.9ms (= 14588.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 72.9ms / 57.3ms)

OneFlow resnet50 time: 53.7ms (= 10748.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.2ms (= 14040.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 70.2ms / 53.7ms)

github-actions[bot] avatar Jul 08 '22 17:07 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Aug 17 '22 07:08 github-actions[bot]

已经由 https://github.com/Oneflow-Inc/oneflow/pull/8936 该PR完成

MARD1NO avatar Sep 24 '22 03:09 MARD1NO