oneflow
oneflow copied to clipboard
Implement sync batchnorm
TODO
- [x] forward implementation
- [x] backward implementation
- [x] add tests
- [x] refine doc
- [ ] nhwc format
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.3ms (= 12825.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.2ms (= 14118.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.2ms / 128.3ms)
OneFlow resnet50 time: 75.4ms (= 7539.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.3ms (= 8429.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.12 (= 84.3ms / 75.4ms)
OneFlow resnet50 time: 48.3ms (= 9661.5ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.3ms (= 11667.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.21 (= 58.3ms / 48.3ms)
OneFlow resnet50 time: 36.0ms (= 7207.9ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.2ms (= 8232.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.14 (= 41.2ms / 36.0ms)
OneFlow resnet50 time: 28.2ms (= 5647.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.4ms (= 7275.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.29 (= 36.4ms / 28.2ms)
OneFlow swin dataloader time: 0.259s (= 51.749s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.105s / 200, num_workers=1)
Relative speed: 0.582 (= 0.151s / 0.259s)
OneFlow swin dataloader time: 0.071s (= 14.209s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.123s / 200, num_workers=4)
Relative speed: 0.572 (= 0.041s / 0.071s)
OneFlow swin dataloader time: 0.042s (= 8.456s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.500s / 200, num_workers=8)
Relative speed: 0.532 (= 0.022s / 0.042s)
❌ OneFlow resnet50 time: 136.7ms (= 13671.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.7ms (= 16072.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 160.7ms / 136.7ms)
OneFlow resnet50 time: 84.7ms (= 8467.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.7ms (= 10171.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 101.7ms / 84.7ms)
OneFlow resnet50 time: 57.9ms (= 11577.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.4ms (= 15879.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 79.4ms / 57.9ms)
OneFlow resnet50 time: 45.4ms (= 9075.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.8ms (= 15157.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 75.8ms / 45.4ms)
OneFlow resnet50 time: 38.9ms (= 7786.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.9ms (= 15573.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 2.00 (= 77.9ms / 38.9ms)
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.4ms (= 12835.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.9ms (= 14185.3ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 141.9ms / 128.4ms)
OneFlow resnet50 time: 75.6ms (= 7555.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.1ms (= 8513.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 85.1ms / 75.6ms)
OneFlow resnet50 time: 48.4ms (= 9675.5ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 63.2ms (= 12647.7ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.31 (= 63.2ms / 48.4ms)
OneFlow resnet50 time: 35.9ms (= 7170.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.4ms (= 8478.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.18 (= 42.4ms / 35.9ms)
OneFlow resnet50 time: 28.2ms (= 5631.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.8ms (= 7963.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.41 (= 39.8ms / 28.2ms)
OneFlow swin dataloader time: 0.261s (= 52.184s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.703s / 200, num_workers=1)
Relative speed: 0.569 (= 0.149s / 0.261s)
OneFlow swin dataloader time: 0.106s (= 21.202s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 7.936s / 200, num_workers=4)
Relative speed: 0.374 (= 0.040s / 0.106s)
OneFlow swin dataloader time: 0.044s (= 8.715s / 200, num_workers=8)
PyTorch swin dataloader time: 0.021s (= 4.212s / 200, num_workers=8)
Relative speed: 0.483 (= 0.021s / 0.044s)
❌ OneFlow resnet50 time: 136.7ms (= 13673.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 167.7ms (= 16774.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 167.7ms / 136.7ms)
OneFlow resnet50 time: 84.2ms (= 8421.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.1ms (= 10205.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.1ms / 84.2ms)
OneFlow resnet50 time: 57.9ms (= 11577.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.7ms (= 15730.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.7ms / 57.9ms)
OneFlow resnet50 time: 45.2ms (= 9037.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 81.0ms (= 16194.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.79 (= 81.0ms / 45.2ms)
OneFlow resnet50 time: 38.8ms (= 7761.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.8ms (= 13565.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.75 (= 67.8ms / 38.8ms)
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.8ms (= 12876.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 150.2ms (= 15022.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.17 (= 150.2ms / 128.8ms)
OneFlow resnet50 time: 75.8ms (= 7582.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 88.3ms (= 8830.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.16 (= 88.3ms / 75.8ms)
OneFlow resnet50 time: 50.1ms (= 10016.4ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 63.2ms (= 12631.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.26 (= 63.2ms / 50.1ms)
OneFlow resnet50 time: 37.0ms (= 7407.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 50.0ms (= 10005.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.35 (= 50.0ms / 37.0ms)
OneFlow resnet50 time: 28.8ms (= 5768.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.7ms (= 7542.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.31 (= 37.7ms / 28.8ms)
OneFlow swin dataloader time: 0.410s (= 81.984s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.061s / 200, num_workers=1)
Relative speed: 0.367 (= 0.150s / 0.410s)
OneFlow swin dataloader time: 0.074s (= 14.883s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.509s / 200, num_workers=4)
Relative speed: 0.572 (= 0.043s / 0.074s)
OneFlow swin dataloader time: 0.040s (= 7.908s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.323s / 200, num_workers=8)
Relative speed: 0.547 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 137.1ms (= 13709.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.7ms (= 16268.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 162.7ms / 137.1ms)
OneFlow resnet50 time: 85.8ms (= 8584.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.1ms (= 10310.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 103.1ms / 85.8ms)
OneFlow resnet50 time: 59.5ms (= 11900.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.7ms (= 15940.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 79.7ms / 59.5ms)
OneFlow resnet50 time: 46.6ms (= 9323.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.7ms (= 15747.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.69 (= 78.7ms / 46.6ms)
OneFlow resnet50 time: 39.2ms (= 7849.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.4ms (= 14284.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.82 (= 71.4ms / 39.2ms)
Speed stats:
GPU Name: NVIDIA GeForce GTX 1080
❌ OneFlow resnet50 time: 130.2ms (= 13018.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.7ms (= 14373.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 143.7ms / 130.2ms)
OneFlow resnet50 time: 77.0ms (= 7695.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.0ms (= 8499.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 85.0ms / 77.0ms)
OneFlow resnet50 time: 49.4ms (= 9874.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.5ms (= 11493.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.16 (= 57.5ms / 49.4ms)
OneFlow resnet50 time: 37.5ms (= 7498.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.1ms (= 8410.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.12 (= 42.1ms / 37.5ms)
OneFlow resnet50 time: 29.0ms (= 5799.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 42.8ms (= 8559.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.48 (= 42.8ms / 29.0ms)
OneFlow swin dataloader time: 0.393s (= 78.610s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.841s / 200, num_workers=1)
Relative speed: 0.380 (= 0.149s / 0.393s)
OneFlow swin dataloader time: 0.073s (= 14.610s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.338s / 200, num_workers=4)
Relative speed: 0.571 (= 0.042s / 0.073s)
OneFlow swin dataloader time: 0.040s (= 7.958s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.404s / 200, num_workers=8)
Relative speed: 0.553 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 147.1ms (= 14709.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 170.6ms (= 17061.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 170.6ms / 147.1ms)
OneFlow resnet50 time: 96.5ms (= 9651.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.5ms (= 11248.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 112.5ms / 96.5ms)
OneFlow resnet50 time: 70.0ms (= 13993.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 89.8ms (= 17959.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 89.8ms / 70.0ms)
OneFlow resnet50 time: 58.0ms (= 11594.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 81.5ms (= 16304.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.41 (= 81.5ms / 58.0ms)
OneFlow resnet50 time: 50.7ms (= 10137.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.8ms (= 13767.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 68.8ms / 50.7ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.2ms (= 12822.1ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 146.1ms (= 14608.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.14 (= 146.1ms / 128.2ms)
OneFlow resnet50 time: 75.4ms (= 7536.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.0ms (= 8502.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 85.0ms / 75.4ms)
OneFlow resnet50 time: 48.2ms (= 9647.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 56.6ms (= 11322.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.17 (= 56.6ms / 48.2ms)
OneFlow resnet50 time: 35.9ms (= 7183.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.4ms (= 8879.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.24 (= 44.4ms / 35.9ms)
OneFlow resnet50 time: 28.0ms (= 5599.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.8ms (= 7551.6ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.35 (= 37.8ms / 28.0ms)
OneFlow swin dataloader time: 0.407s (= 81.453s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.289s / 200, num_workers=1)
Relative speed: 0.372 (= 0.151s / 0.407s)
OneFlow swin dataloader time: 0.072s (= 14.474s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.108s / 200, num_workers=4)
Relative speed: 0.560 (= 0.041s / 0.072s)
OneFlow swin dataloader time: 0.042s (= 8.379s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.533s / 200, num_workers=8)
Relative speed: 0.541 (= 0.023s / 0.042s)
❌ OneFlow resnet50 time: 136.8ms (= 13676.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.1ms (= 16411.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 164.1ms / 136.8ms)
OneFlow resnet50 time: 84.3ms (= 8434.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10233.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.3ms / 84.3ms)
OneFlow resnet50 time: 57.5ms (= 11492.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.4ms (= 15674.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.4ms / 57.5ms)
OneFlow resnet50 time: 45.3ms (= 9056.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.6ms (= 13525.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.49 (= 67.6ms / 45.3ms)
OneFlow resnet50 time: 39.1ms (= 7810.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13606.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 68.0ms / 39.1ms)
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.3ms (= 12834.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 146.5ms (= 14652.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.14 (= 146.5ms / 128.3ms)
OneFlow resnet50 time: 75.3ms (= 7532.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.5ms (= 8549.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 85.5ms / 75.3ms)
OneFlow resnet50 time: 49.0ms (= 9806.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.7ms (= 11942.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.22 (= 59.7ms / 49.0ms)
OneFlow resnet50 time: 36.8ms (= 7355.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.2ms (= 8637.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.17 (= 43.2ms / 36.8ms)
OneFlow resnet50 time: 28.4ms (= 5674.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.2ms (= 7643.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.35 (= 38.2ms / 28.4ms)
OneFlow swin dataloader time: 0.413s (= 82.526s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.004s / 200, num_workers=1)
Relative speed: 0.364 (= 0.150s / 0.413s)
OneFlow swin dataloader time: 0.071s (= 14.143s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 8.055s / 200, num_workers=4)
Relative speed: 0.570 (= 0.040s / 0.071s)
OneFlow swin dataloader time: 0.040s (= 7.931s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.353s / 200, num_workers=8)
Relative speed: 0.549 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 136.8ms (= 13679.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.2ms (= 16122.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.2ms / 136.8ms)
OneFlow resnet50 time: 85.2ms (= 8517.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.7ms (= 10272.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.7ms / 85.2ms)
OneFlow resnet50 time: 58.5ms (= 11702.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 86.9ms (= 17388.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.49 (= 86.9ms / 58.5ms)
OneFlow resnet50 time: 45.9ms (= 9176.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.0ms (= 13990.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 70.0ms / 45.9ms)
OneFlow resnet50 time: 39.2ms (= 7843.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.2ms (= 13640.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 68.2ms / 39.2ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.3ms (= 12833.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.0ms (= 14195.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.0ms / 128.3ms)
OneFlow resnet50 time: 75.3ms (= 7531.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 88.1ms (= 8808.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.17 (= 88.1ms / 75.3ms)
OneFlow resnet50 time: 48.3ms (= 9661.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 61.9ms (= 12381.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.28 (= 61.9ms / 48.3ms)
OneFlow resnet50 time: 36.1ms (= 7223.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.4ms (= 8672.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.20 (= 43.4ms / 36.1ms)
OneFlow resnet50 time: 28.3ms (= 5666.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.6ms (= 7729.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.36 (= 38.6ms / 28.3ms)
OneFlow swin dataloader time: 0.270s (= 54.049s / 200, num_workers=1)
PyTorch swin dataloader time: 0.152s (= 30.489s / 200, num_workers=1)
Relative speed: 0.564 (= 0.152s / 0.270s)
OneFlow swin dataloader time: 0.067s (= 13.460s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 8.032s / 200, num_workers=4)
Relative speed: 0.597 (= 0.040s / 0.067s)
OneFlow swin dataloader time: 0.039s (= 7.856s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.572s / 200, num_workers=8)
Relative speed: 0.582 (= 0.023s / 0.039s)
❌ OneFlow resnet50 time: 136.5ms (= 13649.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 167.9ms (= 16794.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 167.9ms / 136.5ms)
OneFlow resnet50 time: 84.7ms (= 8467.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.0ms (= 10199.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.0ms / 84.7ms)
OneFlow resnet50 time: 57.8ms (= 11563.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.5ms (= 15700.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.5ms / 57.8ms)
OneFlow resnet50 time: 45.4ms (= 9072.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.4ms (= 14489.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.60 (= 72.4ms / 45.4ms)
OneFlow resnet50 time: 39.1ms (= 7811.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.1ms (= 14829.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.90 (= 74.1ms / 39.1ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.3ms (= 12828.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.5ms (= 14352.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 143.5ms / 128.3ms)
OneFlow resnet50 time: 75.2ms (= 7524.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.5ms (= 8551.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 85.5ms / 75.2ms)
OneFlow resnet50 time: 48.3ms (= 9654.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.4ms (= 12486.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.29 (= 62.4ms / 48.3ms)
OneFlow resnet50 time: 35.9ms (= 7172.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 40.6ms (= 8128.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.13 (= 40.6ms / 35.9ms)
OneFlow resnet50 time: 28.1ms (= 5626.1ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.9ms (= 8187.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.46 (= 40.9ms / 28.1ms)
OneFlow swin dataloader time: 0.266s (= 53.184s / 200, num_workers=1)
PyTorch swin dataloader time: 0.148s (= 29.644s / 200, num_workers=1)
Relative speed: 0.557 (= 0.148s / 0.266s)
OneFlow swin dataloader time: 0.072s (= 14.317s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.268s / 200, num_workers=4)
Relative speed: 0.577 (= 0.041s / 0.072s)
OneFlow swin dataloader time: 0.040s (= 8.021s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.617s / 200, num_workers=8)
Relative speed: 0.576 (= 0.023s / 0.040s)
❌ OneFlow resnet50 time: 136.6ms (= 13662.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.2ms (= 16121.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.2ms / 136.6ms)
OneFlow resnet50 time: 84.3ms (= 8427.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.1ms (= 10213.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.1ms / 84.3ms)
OneFlow resnet50 time: 57.7ms (= 11531.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.0ms (= 17606.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 88.0ms / 57.7ms)
OneFlow resnet50 time: 45.2ms (= 9048.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.0ms (= 14393.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 72.0ms / 45.2ms)
OneFlow resnet50 time: 38.8ms (= 7762.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.4ms (= 13476.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 67.4ms / 38.8ms)
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.3ms (= 12831.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.6ms (= 14256.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.6ms / 128.3ms)
OneFlow resnet50 time: 75.5ms (= 7549.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.7ms (= 8666.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.15 (= 86.7ms / 75.5ms)
OneFlow resnet50 time: 48.5ms (= 9693.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.2ms (= 11841.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.22 (= 59.2ms / 48.5ms)
OneFlow resnet50 time: 36.0ms (= 7206.5ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.4ms (= 8287.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.15 (= 41.4ms / 36.0ms)
OneFlow resnet50 time: 28.2ms (= 5639.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.4ms (= 8286.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.47 (= 41.4ms / 28.2ms)
OneFlow swin dataloader time: 0.273s (= 54.682s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.197s / 200, num_workers=1)
Relative speed: 0.552 (= 0.151s / 0.273s)
OneFlow swin dataloader time: 0.071s (= 14.121s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.642s / 200, num_workers=4)
Relative speed: 0.612 (= 0.043s / 0.071s)
OneFlow swin dataloader time: 0.060s (= 11.972s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.425s / 200, num_workers=8)
Relative speed: 0.370 (= 0.022s / 0.060s)
❌ OneFlow resnet50 time: 136.7ms (= 13667.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 168.0ms (= 16801.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 168.0ms / 136.7ms)
OneFlow resnet50 time: 84.2ms (= 8423.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 105.5ms (= 10552.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 105.5ms / 84.2ms)
OneFlow resnet50 time: 58.2ms (= 11637.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.4ms (= 15683.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.4ms / 58.2ms)
OneFlow resnet50 time: 45.3ms (= 9068.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.5ms (= 13892.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 69.5ms / 45.3ms)
OneFlow resnet50 time: 38.8ms (= 7750.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.7ms (= 13332.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 66.7ms / 38.8ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.3ms (= 12833.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.2ms (= 14223.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.2ms / 128.3ms)
OneFlow resnet50 time: 75.3ms (= 7533.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.8ms (= 8681.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.15 (= 86.8ms / 75.3ms)
OneFlow resnet50 time: 48.4ms (= 9670.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.5ms (= 12492.7ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.29 (= 62.5ms / 48.4ms)
OneFlow resnet50 time: 35.9ms (= 7184.3ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.0ms (= 8403.9ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.17 (= 42.0ms / 35.9ms)
OneFlow resnet50 time: 28.1ms (= 5618.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.7ms (= 8133.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.45 (= 40.7ms / 28.1ms)
OneFlow swin dataloader time: 0.266s (= 53.146s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.796s / 200, num_workers=1)
Relative speed: 0.561 (= 0.149s / 0.266s)
OneFlow swin dataloader time: 0.070s (= 13.921s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 8.020s / 200, num_workers=4)
Relative speed: 0.576 (= 0.040s / 0.070s)
OneFlow swin dataloader time: 0.063s (= 12.688s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.490s / 200, num_workers=8)
Relative speed: 0.354 (= 0.022s / 0.063s)
❌ OneFlow resnet50 time: 136.6ms (= 13658.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.4ms (= 16342.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 163.4ms / 136.6ms)
OneFlow resnet50 time: 84.6ms (= 8459.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 113.2ms (= 11319.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 113.2ms / 84.6ms)
OneFlow resnet50 time: 57.8ms (= 11555.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 90.2ms (= 18036.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 90.2ms / 57.8ms)
OneFlow resnet50 time: 45.3ms (= 9057.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.3ms (= 16066.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.77 (= 80.3ms / 45.3ms)
OneFlow resnet50 time: 39.0ms (= 7796.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13598.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 68.0ms / 39.0ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.5ms (= 12853.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 146.4ms (= 14636.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.14 (= 146.4ms / 128.5ms)
OneFlow resnet50 time: 75.5ms (= 7547.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.2ms (= 8518.3ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 85.2ms / 75.5ms)
OneFlow resnet50 time: 49.1ms (= 9821.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.6ms (= 12520.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.27 (= 62.6ms / 49.1ms)
OneFlow resnet50 time: 36.5ms (= 7291.3ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 40.5ms (= 8099.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.11 (= 40.5ms / 36.5ms)
OneFlow resnet50 time: 28.5ms (= 5694.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.8ms (= 7362.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.29 (= 36.8ms / 28.5ms)
OneFlow swin dataloader time: 0.262s (= 52.433s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.063s / 200, num_workers=1)
Relative speed: 0.573 (= 0.150s / 0.262s)
OneFlow swin dataloader time: 0.111s (= 22.256s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.398s / 200, num_workers=4)
Relative speed: 0.377 (= 0.042s / 0.111s)
OneFlow swin dataloader time: 0.039s (= 7.849s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.530s / 200, num_workers=8)
Relative speed: 0.577 (= 0.023s / 0.039s)
❌ OneFlow resnet50 time: 136.8ms (= 13680.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.6ms (= 16159.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.6ms / 136.8ms)
OneFlow resnet50 time: 85.3ms (= 8525.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.0ms (= 10400.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 104.0ms / 85.3ms)
OneFlow resnet50 time: 58.4ms (= 11683.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.0ms (= 16001.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 80.0ms / 58.4ms)
OneFlow resnet50 time: 46.2ms (= 9233.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.7ms (= 14343.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 71.7ms / 46.2ms)
OneFlow resnet50 time: 38.8ms (= 7759.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.6ms (= 14314.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.84 (= 71.6ms / 38.8ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.6ms (= 12859.5ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.0ms (= 14304.8ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 143.0ms / 128.6ms)
OneFlow resnet50 time: 75.5ms (= 7545.9ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.7ms (= 8567.8ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 85.7ms / 75.5ms)
OneFlow resnet50 time: 49.0ms (= 9806.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.4ms (= 11876.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.21 (= 59.4ms / 49.0ms)
OneFlow resnet50 time: 36.6ms (= 7311.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.4ms (= 8476.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.16 (= 42.4ms / 36.6ms)
OneFlow resnet50 time: 28.5ms (= 5700.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 35.5ms (= 7108.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.25 (= 35.5ms / 28.5ms)
OneFlow swin dataloader time: 0.417s (= 83.321s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.005s / 200, num_workers=1)
Relative speed: 0.360 (= 0.150s / 0.417s)
OneFlow swin dataloader time: 0.071s (= 14.274s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.238s / 200, num_workers=4)
Relative speed: 0.577 (= 0.041s / 0.071s)
OneFlow swin dataloader time: 0.039s (= 7.857s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.405s / 200, num_workers=8)
Relative speed: 0.561 (= 0.022s / 0.039s)
❌ OneFlow resnet50 time: 136.6ms (= 13657.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.8ms (= 16179.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.8ms / 136.6ms)
OneFlow resnet50 time: 85.6ms (= 8558.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.5ms (= 11249.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 112.5ms / 85.6ms)
OneFlow resnet50 time: 58.2ms (= 11638.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.1ms (= 15616.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 78.1ms / 58.2ms)
OneFlow resnet50 time: 46.0ms (= 9205.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.9ms (= 14784.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 73.9ms / 46.0ms)
OneFlow resnet50 time: 39.1ms (= 7819.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13585.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 67.9ms / 39.1ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.3ms (= 12829.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.0ms (= 14302.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 143.0ms / 128.3ms)
OneFlow resnet50 time: 75.3ms (= 7534.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.1ms (= 8308.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 83.1ms / 75.3ms)
OneFlow resnet50 time: 48.7ms (= 9743.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 54.0ms (= 10809.4ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.11 (= 54.0ms / 48.7ms)
OneFlow resnet50 time: 36.2ms (= 7235.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 38.3ms (= 7655.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.06 (= 38.3ms / 36.2ms)
OneFlow resnet50 time: 28.2ms (= 5647.5ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.6ms (= 8327.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.47 (= 41.6ms / 28.2ms)
OneFlow swin dataloader time: 0.270s (= 53.986s / 200, num_workers=1)
PyTorch swin dataloader time: 0.152s (= 30.319s / 200, num_workers=1)
Relative speed: 0.562 (= 0.152s / 0.270s)
OneFlow swin dataloader time: 0.071s (= 14.150s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.162s / 200, num_workers=4)
Relative speed: 0.577 (= 0.041s / 0.071s)
OneFlow swin dataloader time: 0.040s (= 8.006s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.376s / 200, num_workers=8)
Relative speed: 0.547 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 136.9ms (= 13685.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.0ms (= 16198.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 162.0ms / 136.9ms)
OneFlow resnet50 time: 85.0ms (= 8502.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.7ms (= 10272.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.7ms / 85.0ms)
OneFlow resnet50 time: 58.3ms (= 11658.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.4ms (= 15884.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 79.4ms / 58.3ms)
OneFlow resnet50 time: 45.0ms (= 9002.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15530.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.73 (= 77.7ms / 45.0ms)
OneFlow resnet50 time: 39.3ms (= 7854.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.5ms (= 13501.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 67.5ms / 39.3ms)
Speed stats:
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8854/