[bugfix]fix bug of oneflow backend be stuck
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
View latest API docs preview at: https://oneflow-staging.oss-cn-beijing.aliyuncs.com/docs/Oneflow-Inc/oneflow/pr/10435/
Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti
❌ OneFlow resnet50 time: 43.7ms (= 4369.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.5ms (= 5751.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.5ms / 43.7ms)
OneFlow resnet50 time: 26.6ms (= 2657.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.3ms (= 3734.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.41 (= 37.3ms / 26.6ms)
OneFlow resnet50 time: 20.0ms (= 3996.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 34.8ms (= 6959.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.74 (= 34.8ms / 20.0ms)
OneFlow resnet50 time: 17.4ms (= 3477.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.1ms (= 6222.0ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.79 (= 31.1ms / 17.4ms)
OneFlow resnet50 time: 17.5ms (= 3495.4ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.4ms (= 5877.6ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.68 (= 29.4ms / 17.5ms)
OneFlow swin dataloader time: 0.200s (= 39.940s / 200, num_workers=1)
PyTorch swin dataloader time: 0.129s (= 25.731s / 200, num_workers=1)
Relative speed: 0.644 (= 0.129s / 0.200s)
OneFlow swin dataloader time: 0.055s (= 10.904s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.523s / 200, num_workers=4)
Relative speed: 0.598 (= 0.033s / 0.055s)
OneFlow swin dataloader time: 0.030s (= 5.942s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.348s / 200, num_workers=8)
Relative speed: 0.563 (= 0.017s / 0.030s)
❌ OneFlow resnet50 time: 49.2ms (= 4917.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.6ms (= 6561.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 65.6ms / 49.2ms)
OneFlow resnet50 time: 35.8ms (= 3585.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.1ms (= 4612.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.29 (= 46.1ms / 35.8ms)
OneFlow resnet50 time: 28.0ms (= 5607.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.6ms (= 8117.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 40.6ms / 28.0ms)
OneFlow resnet50 time: 25.0ms (= 4990.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.4ms (= 7686.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 38.4ms / 25.0ms)
OneFlow resnet50 time: 24.0ms (= 4805.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 37.0ms (= 7395.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 37.0ms / 24.0ms)
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
View latest API docs preview at: https://oneflow-staging.oss-cn-beijing.aliyuncs.com/docs/Oneflow-Inc/oneflow/pr/10435/
Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti
❌ OneFlow resnet50 time: 43.9ms (= 4388.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.0ms (= 5700.3ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.30 (= 57.0ms / 43.9ms)
OneFlow resnet50 time: 26.5ms (= 2650.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.9ms (= 3892.8ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.47 (= 38.9ms / 26.5ms)
OneFlow resnet50 time: 18.3ms (= 3656.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 34.5ms (= 6892.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.88 (= 34.5ms / 18.3ms)
OneFlow resnet50 time: 17.6ms (= 3522.9ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 29.5ms (= 5903.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.68 (= 29.5ms / 17.6ms)
OneFlow resnet50 time: 16.1ms (= 3226.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 31.4ms (= 6283.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.95 (= 31.4ms / 16.1ms)
OneFlow swin dataloader time: 0.200s (= 39.987s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.508s / 200, num_workers=1)
Relative speed: 0.638 (= 0.128s / 0.200s)
OneFlow swin dataloader time: 0.054s (= 10.831s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.395s / 200, num_workers=4)
Relative speed: 0.590 (= 0.032s / 0.054s)
OneFlow swin dataloader time: 0.030s (= 6.062s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.382s / 200, num_workers=8)
Relative speed: 0.558 (= 0.017s / 0.030s)
❌ OneFlow resnet50 time: 49.2ms (= 4918.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.8ms (= 6477.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 64.8ms / 49.2ms)
OneFlow resnet50 time: 36.2ms (= 3624.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 44.9ms (= 4492.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 44.9ms / 36.2ms)
OneFlow resnet50 time: 28.5ms (= 5691.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 39.7ms (= 7940.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 39.7ms / 28.5ms)
OneFlow resnet50 time: 25.0ms (= 4995.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 39.1ms (= 7815.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 39.1ms / 25.0ms)
OneFlow resnet50 time: 24.0ms (= 4791.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.0ms (= 7200.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.50 (= 36.0ms / 24.0ms)
View latest API docs preview at: https://oneflow-staging.oss-cn-beijing.aliyuncs.com/docs/Oneflow-Inc/oneflow/pr/10435/
Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti
❌ OneFlow resnet50 time: 43.7ms (= 4367.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 58.3ms (= 5827.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.33 (= 58.3ms / 43.7ms)
OneFlow resnet50 time: 26.2ms (= 2621.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.5ms (= 3752.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.43 (= 37.5ms / 26.2ms)
OneFlow resnet50 time: 18.3ms (= 3666.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.1ms (= 7218.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.97 (= 36.1ms / 18.3ms)
OneFlow resnet50 time: 18.4ms (= 3683.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.2ms (= 6243.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.69 (= 31.2ms / 18.4ms)
OneFlow resnet50 time: 16.5ms (= 3308.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.5ms (= 5902.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.78 (= 29.5ms / 16.5ms)
OneFlow swin dataloader time: 0.199s (= 39.873s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.656s / 200, num_workers=1)
Relative speed: 0.643 (= 0.128s / 0.199s)
OneFlow swin dataloader time: 0.056s (= 11.135s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.636s / 200, num_workers=4)
Relative speed: 0.596 (= 0.033s / 0.056s)
OneFlow swin dataloader time: 0.033s (= 6.669s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.356s / 200, num_workers=8)
Relative speed: 0.503 (= 0.017s / 0.033s)
❌ OneFlow resnet50 time: 49.0ms (= 4901.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.2ms (= 6618.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 66.2ms / 49.0ms)
OneFlow resnet50 time: 36.6ms (= 3658.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 45.2ms (= 4517.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 45.2ms / 36.6ms)
OneFlow resnet50 time: 28.1ms (= 5626.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.1ms (= 8021.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.43 (= 40.1ms / 28.1ms)
OneFlow resnet50 time: 24.8ms (= 4959.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 39.5ms (= 7900.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 39.5ms / 24.8ms)
OneFlow resnet50 time: 24.1ms (= 4819.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 37.3ms (= 7454.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 37.3ms / 24.1ms)