oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

unified autograd engine

Open hjchen2 opened this issue 2 years ago • 15 comments

This PR aims to make nn graph use the unified autograd engine like eager mode. Background and discussion:https://github.com/Oneflow-Inc/OneTeam/issues/1504

The following optimizations have been adapted:

  • [x] Gradient Accumulation
  • [x] AMP
  • [x] ZeRO
  • [ ] Quantization Aware Training (will be refactored later)
  • [x] Pipeline

hjchen2 avatar Jun 28 '22 09:06 hjchen2

Speed stats:

github-actions[bot] avatar Jul 29 '22 03:07 github-actions[bot]

Speed stats:

github-actions[bot] avatar Jul 29 '22 06:07 github-actions[bot]

Speed stats:

github-actions[bot] avatar Jul 29 '22 08:07 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 130.0ms (= 13002.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 144.5ms (= 14454.5ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 144.5ms / 130.0ms)

OneFlow resnet50 time: 76.2ms (= 7620.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.5ms (= 8754.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.15 (= 87.5ms / 76.2ms)

OneFlow resnet50 time: 49.9ms (= 9979.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 65.4ms (= 13073.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.31 (= 65.4ms / 49.9ms)

OneFlow resnet50 time: 37.5ms (= 7505.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 46.1ms (= 9216.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.23 (= 46.1ms / 37.5ms)

OneFlow resnet50 time: 30.6ms (= 6126.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.2ms (= 8031.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.31 (= 40.2ms / 30.6ms)

OneFlow swin dataloader time: 0.254s (= 50.864s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.215s / 200, num_workers=1)
Relative speed: 0.594 (= 0.151s / 0.254s)

OneFlow swin dataloader time: 0.070s (= 13.990s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 7.953s / 200, num_workers=4)
Relative speed: 0.569 (= 0.040s / 0.070s)

OneFlow swin dataloader time: 0.039s (= 7.787s / 200, num_workers=8)
PyTorch swin dataloader time: 0.021s (= 4.293s / 200, num_workers=8)
Relative speed: 0.551 (= 0.021s / 0.039s)

❌ OneFlow resnet50 time: 145.1ms (= 14505.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 168.1ms (= 16805.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 168.1ms / 145.1ms)

OneFlow resnet50 time: 95.0ms (= 9499.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 113.6ms (= 11364.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 113.6ms / 95.0ms)

OneFlow resnet50 time: 67.2ms (= 13448.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.5ms (= 17697.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 88.5ms / 67.2ms)

OneFlow resnet50 time: 55.6ms (= 11119.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.5ms (= 14899.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 74.5ms / 55.6ms)

OneFlow resnet50 time: 49.9ms (= 9975.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.0ms (= 14008.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 70.0ms / 49.9ms)

github-actions[bot] avatar Jul 30 '22 09:07 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8510/

github-actions[bot] avatar Jul 30 '22 12:07 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.6ms (= 12961.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.4ms (= 14343.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 143.4ms / 129.6ms)

OneFlow resnet50 time: 76.0ms (= 7601.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.3ms (= 8531.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.12 (= 85.3ms / 76.0ms)

OneFlow resnet50 time: 48.4ms (= 9670.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 63.9ms (= 12786.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.32 (= 63.9ms / 48.4ms)

OneFlow resnet50 time: 36.5ms (= 7295.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.6ms (= 8319.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.14 (= 41.6ms / 36.5ms)

OneFlow resnet50 time: 30.5ms (= 6096.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.0ms (= 7991.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.31 (= 40.0ms / 30.5ms)

OneFlow swin dataloader time: 0.265s (= 52.998s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.228s / 200, num_workers=1)
Relative speed: 0.570 (= 0.151s / 0.265s)

OneFlow swin dataloader time: 0.070s (= 14.004s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.588s / 200, num_workers=4)
Relative speed: 0.613 (= 0.043s / 0.070s)

OneFlow swin dataloader time: 0.040s (= 7.977s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.430s / 200, num_workers=8)
Relative speed: 0.555 (= 0.022s / 0.040s)

❌ OneFlow resnet50 time: 144.9ms (= 14494.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 168.0ms (= 16795.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 168.0ms / 144.9ms)

OneFlow resnet50 time: 95.6ms (= 9557.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.9ms (= 11287.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 112.9ms / 95.6ms)

OneFlow resnet50 time: 67.0ms (= 13394.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.0ms (= 17590.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 88.0ms / 67.0ms)

OneFlow resnet50 time: 55.3ms (= 11067.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.9ms (= 15588.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.41 (= 77.9ms / 55.3ms)

OneFlow resnet50 time: 48.0ms (= 9599.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.8ms (= 13960.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 69.8ms / 48.0ms)

github-actions[bot] avatar Jul 30 '22 12:07 github-actions[bot]

Static analysis with clang failed. PR label automerge has been removed

github-actions[bot] avatar Jul 31 '22 05:07 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions[bot] avatar Jul 31 '22 14:07 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 130.4ms (= 13043.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.6ms (= 14358.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 143.6ms / 130.4ms)

OneFlow resnet50 time: 77.3ms (= 7733.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.7ms (= 8671.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.12 (= 86.7ms / 77.3ms)

OneFlow resnet50 time: 51.2ms (= 10240.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 64.1ms (= 12818.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.25 (= 64.1ms / 51.2ms)

OneFlow resnet50 time: 38.5ms (= 7700.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 40.8ms (= 8150.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.06 (= 40.8ms / 38.5ms)

OneFlow resnet50 time: 29.7ms (= 5944.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.1ms (= 8011.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.35 (= 40.1ms / 29.7ms)

OneFlow swin dataloader time: 0.407s (= 81.384s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.035s / 200, num_workers=1)
Relative speed: 0.369 (= 0.150s / 0.407s)

OneFlow swin dataloader time: 0.109s (= 21.744s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.147s / 200, num_workers=4)
Relative speed: 0.375 (= 0.041s / 0.109s)

OneFlow swin dataloader time: 0.041s (= 8.196s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.556s / 200, num_workers=8)
Relative speed: 0.556 (= 0.023s / 0.041s)

❌ OneFlow resnet50 time: 144.7ms (= 14473.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 170.6ms (= 17055.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 170.6ms / 144.7ms)

OneFlow resnet50 time: 95.6ms (= 9559.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 113.1ms (= 11308.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 113.1ms / 95.6ms)

OneFlow resnet50 time: 66.8ms (= 13354.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 90.2ms (= 18047.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 90.2ms / 66.8ms)

OneFlow resnet50 time: 55.9ms (= 11170.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.6ms (= 14919.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 74.6ms / 55.9ms)

OneFlow resnet50 time: 50.1ms (= 10026.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.1ms (= 15823.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 79.1ms / 50.1ms)

github-actions[bot] avatar Aug 03 '22 11:08 github-actions[bot]

Speed stats:

github-actions[bot] avatar Aug 09 '22 09:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8510/

github-actions[bot] avatar Aug 09 '22 13:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8510/

github-actions[bot] avatar Aug 10 '22 02:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8510/

github-actions[bot] avatar Aug 10 '22 07:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.2ms (= 12820.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.6ms (= 14256.3ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.6ms / 128.2ms)

OneFlow resnet50 time: 75.2ms (= 7521.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 82.3ms (= 8231.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.09 (= 82.3ms / 75.2ms)

OneFlow resnet50 time: 48.3ms (= 9652.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.6ms (= 11511.4ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.19 (= 57.6ms / 48.3ms)

OneFlow resnet50 time: 35.8ms (= 7168.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.0ms (= 8206.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.14 (= 41.0ms / 35.8ms)

OneFlow resnet50 time: 28.3ms (= 5661.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 35.5ms (= 7092.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.25 (= 35.5ms / 28.3ms)

OneFlow swin dataloader time: 0.403s (= 80.507s / 200, num_workers=1)
PyTorch swin dataloader time: 0.154s (= 30.789s / 200, num_workers=1)
Relative speed: 0.382 (= 0.154s / 0.403s)

OneFlow swin dataloader time: 0.070s (= 14.096s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.497s / 200, num_workers=4)
Relative speed: 0.603 (= 0.042s / 0.070s)

OneFlow swin dataloader time: 0.039s (= 7.704s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.348s / 200, num_workers=8)
Relative speed: 0.564 (= 0.022s / 0.039s)

❌ OneFlow resnet50 time: 136.7ms (= 13671.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.8ms (= 16084.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 160.8ms / 136.7ms)

OneFlow resnet50 time: 84.1ms (= 8405.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.0ms (= 10204.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.0ms / 84.1ms)

OneFlow resnet50 time: 57.9ms (= 11570.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.3ms (= 15654.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.3ms / 57.9ms)

OneFlow resnet50 time: 45.1ms (= 9023.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.4ms (= 14086.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 70.4ms / 45.1ms)

OneFlow resnet50 time: 39.0ms (= 7807.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.2ms (= 13430.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 67.2ms / 39.0ms)

github-actions[bot] avatar Aug 10 '22 07:08 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions[bot] avatar Aug 10 '22 08:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8510/

github-actions[bot] avatar Aug 11 '22 18:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.4ms (= 12836.5ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.8ms (= 14279.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.8ms / 128.4ms)

OneFlow resnet50 time: 75.3ms (= 7531.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.8ms (= 8377.8ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.11 (= 83.8ms / 75.3ms)

OneFlow resnet50 time: 48.7ms (= 9732.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 56.7ms (= 11339.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.17 (= 56.7ms / 48.7ms)

OneFlow resnet50 time: 36.5ms (= 7291.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.4ms (= 9084.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.25 (= 45.4ms / 36.5ms)

OneFlow resnet50 time: 28.3ms (= 5654.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.1ms (= 7619.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.35 (= 38.1ms / 28.3ms)

OneFlow swin dataloader time: 0.401s (= 80.280s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.156s / 200, num_workers=1)
Relative speed: 0.376 (= 0.151s / 0.401s)

OneFlow swin dataloader time: 0.070s (= 14.014s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.313s / 200, num_workers=4)
Relative speed: 0.593 (= 0.042s / 0.070s)

OneFlow swin dataloader time: 0.059s (= 11.867s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.452s / 200, num_workers=8)
Relative speed: 0.375 (= 0.022s / 0.059s)

❌ OneFlow resnet50 time: 136.7ms (= 13673.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 167.6ms (= 16759.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 167.6ms / 136.7ms)

OneFlow resnet50 time: 85.1ms (= 8513.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.7ms (= 10472.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 104.7ms / 85.1ms)

OneFlow resnet50 time: 57.9ms (= 11576.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.2ms (= 15444.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 77.2ms / 57.9ms)

OneFlow resnet50 time: 45.3ms (= 9066.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.9ms (= 15570.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 77.9ms / 45.3ms)

OneFlow resnet50 time: 39.0ms (= 7799.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.8ms (= 14165.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.82 (= 70.8ms / 39.0ms)

github-actions[bot] avatar Aug 11 '22 18:08 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Aug 11 '22 18:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.5ms (= 12849.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.3ms (= 14233.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.3ms / 128.5ms)

OneFlow resnet50 time: 75.3ms (= 7534.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.4ms (= 8338.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.11 (= 83.4ms / 75.3ms)

OneFlow resnet50 time: 48.3ms (= 9669.4ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.4ms (= 11684.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.21 (= 58.4ms / 48.3ms)

OneFlow resnet50 time: 36.0ms (= 7201.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 40.3ms (= 8060.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.12 (= 40.3ms / 36.0ms)

OneFlow resnet50 time: 28.2ms (= 5643.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.3ms (= 8268.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.47 (= 41.3ms / 28.2ms)

OneFlow swin dataloader time: 0.418s (= 83.614s / 200, num_workers=1)
PyTorch swin dataloader time: 0.153s (= 30.641s / 200, num_workers=1)
Relative speed: 0.366 (= 0.153s / 0.418s)

OneFlow swin dataloader time: 0.070s (= 13.901s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.269s / 200, num_workers=4)
Relative speed: 0.595 (= 0.041s / 0.070s)

OneFlow swin dataloader time: 0.060s (= 11.942s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.587s / 200, num_workers=8)
Relative speed: 0.384 (= 0.023s / 0.060s)

❌ OneFlow resnet50 time: 136.7ms (= 13666.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.5ms (= 16147.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.5ms / 136.7ms)

OneFlow resnet50 time: 84.3ms (= 8435.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.0ms (= 10202.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.0ms / 84.3ms)

OneFlow resnet50 time: 58.0ms (= 11599.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.9ms (= 15186.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 75.9ms / 58.0ms)

OneFlow resnet50 time: 45.5ms (= 9090.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.6ms (= 13910.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 69.6ms / 45.5ms)

OneFlow resnet50 time: 39.0ms (= 7806.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.7ms (= 14745.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.89 (= 73.7ms / 39.0ms)

github-actions[bot] avatar Aug 12 '22 06:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8510/

github-actions[bot] avatar Aug 12 '22 06:08 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions[bot] avatar Aug 12 '22 12:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8510/

github-actions[bot] avatar Aug 13 '22 14:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.3ms (= 12826.1ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.1ms (= 14114.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.1ms / 128.3ms)

OneFlow resnet50 time: 75.3ms (= 7528.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.5ms (= 8650.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.15 (= 86.5ms / 75.3ms)

OneFlow resnet50 time: 48.7ms (= 9745.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.9ms (= 11972.4ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.23 (= 59.9ms / 48.7ms)

OneFlow resnet50 time: 36.3ms (= 7256.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.0ms (= 8998.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.24 (= 45.0ms / 36.3ms)

OneFlow resnet50 time: 28.2ms (= 5635.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.0ms (= 7805.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.38 (= 39.0ms / 28.2ms)

OneFlow swin dataloader time: 0.267s (= 53.383s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.291s / 200, num_workers=1)
Relative speed: 0.567 (= 0.151s / 0.267s)

OneFlow swin dataloader time: 0.111s (= 22.106s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.547s / 200, num_workers=4)
Relative speed: 0.387 (= 0.043s / 0.111s)

OneFlow swin dataloader time: 0.061s (= 12.141s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.373s / 200, num_workers=8)
Relative speed: 0.360 (= 0.022s / 0.061s)

❌ OneFlow resnet50 time: 136.7ms (= 13673.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.3ms (= 16127.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.3ms / 136.7ms)

OneFlow resnet50 time: 85.1ms (= 8511.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 106.1ms (= 10612.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 106.1ms / 85.1ms)

OneFlow resnet50 time: 58.4ms (= 11688.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.5ms (= 15891.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 79.5ms / 58.4ms)

OneFlow resnet50 time: 45.4ms (= 9086.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.5ms (= 13702.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 68.5ms / 45.4ms)

OneFlow resnet50 time: 39.0ms (= 7801.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.3ms (= 13658.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.75 (= 68.3ms / 39.0ms)

github-actions[bot] avatar Aug 13 '22 14:08 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions[bot] avatar Aug 13 '22 15:08 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions[bot] avatar Aug 14 '22 08:08 github-actions[bot]

CI failed when running job: Build cu102. PR label automerge has been removed

github-actions[bot] avatar Aug 14 '22 09:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.4ms (= 12844.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.5ms (= 14354.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 143.5ms / 128.4ms)

OneFlow resnet50 time: 75.7ms (= 7571.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.0ms (= 8296.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 83.0ms / 75.7ms)

OneFlow resnet50 time: 49.2ms (= 9848.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 65.1ms (= 13021.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.32 (= 65.1ms / 49.2ms)

OneFlow resnet50 time: 36.7ms (= 7334.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.9ms (= 8970.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.22 (= 44.9ms / 36.7ms)

OneFlow resnet50 time: 28.6ms (= 5726.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.1ms (= 7826.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.37 (= 39.1ms / 28.6ms)

OneFlow swin dataloader time: 0.275s (= 54.970s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.839s / 200, num_workers=1)
Relative speed: 0.543 (= 0.149s / 0.275s)

OneFlow swin dataloader time: 0.069s (= 13.782s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.131s / 200, num_workers=4)
Relative speed: 0.590 (= 0.041s / 0.069s)

OneFlow swin dataloader time: 0.061s (= 12.291s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.320s / 200, num_workers=8)
Relative speed: 0.352 (= 0.022s / 0.061s)

❌ OneFlow resnet50 time: 136.8ms (= 13683.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.4ms (= 16235.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 162.4ms / 136.8ms)

OneFlow resnet50 time: 85.9ms (= 8591.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.0ms (= 10405.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 104.0ms / 85.9ms)

OneFlow resnet50 time: 58.5ms (= 11699.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.4ms (= 15685.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 78.4ms / 58.5ms)

OneFlow resnet50 time: 45.5ms (= 9091.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.8ms (= 16165.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.78 (= 80.8ms / 45.5ms)

OneFlow resnet50 time: 39.4ms (= 7885.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.0ms (= 15405.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.95 (= 77.0ms / 39.4ms)

github-actions[bot] avatar Aug 14 '22 11:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8510/

github-actions[bot] avatar Aug 14 '22 18:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.4ms (= 12835.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.9ms (= 14287.8ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.9ms / 128.4ms)

OneFlow resnet50 time: 75.3ms (= 7534.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.9ms (= 8791.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.17 (= 87.9ms / 75.3ms)

OneFlow resnet50 time: 48.8ms (= 9765.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.1ms (= 11816.7ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.21 (= 59.1ms / 48.8ms)

OneFlow resnet50 time: 36.4ms (= 7273.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 40.3ms (= 8060.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.11 (= 40.3ms / 36.4ms)

OneFlow resnet50 time: 28.4ms (= 5679.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.4ms (= 7682.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.35 (= 38.4ms / 28.4ms)

OneFlow swin dataloader time: 0.270s (= 54.075s / 200, num_workers=1)
PyTorch swin dataloader time: 0.155s (= 31.068s / 200, num_workers=1)
Relative speed: 0.575 (= 0.155s / 0.270s)

OneFlow swin dataloader time: 0.082s (= 16.423s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 8.093s / 200, num_workers=4)
Relative speed: 0.493 (= 0.040s / 0.082s)

OneFlow swin dataloader time: 0.043s (= 8.663s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.401s / 200, num_workers=8)
Relative speed: 0.508 (= 0.022s / 0.043s)

❌ OneFlow resnet50 time: 136.7ms (= 13666.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.2ms (= 16218.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 162.2ms / 136.7ms)

OneFlow resnet50 time: 85.4ms (= 8543.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.3ms (= 10430.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 104.3ms / 85.4ms)

OneFlow resnet50 time: 58.3ms (= 11654.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.3ms (= 15459.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 77.3ms / 58.3ms)

OneFlow resnet50 time: 45.8ms (= 9159.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.1ms (= 13829.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 69.1ms / 45.8ms)

OneFlow resnet50 time: 38.9ms (= 7786.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.3ms (= 13868.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.78 (= 69.3ms / 38.9ms)

github-actions[bot] avatar Aug 14 '22 18:08 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions[bot] avatar Aug 14 '22 18:08 github-actions[bot]