Dev add bitwise shift op

api docs的截图贴一下吧
CI failed when running job: cuda-module. PR label automerge has been removed
Speed stats:
Speed stats:
GPU Name: NVIDIA GeForce GTX 1080
❌ OneFlow resnet50 time: 141.6ms (= 14156.1ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.4ms (= 14241.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.01 (= 142.4ms / 141.6ms)
OneFlow resnet50 time: 82.4ms (= 8240.4ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.7ms (= 8571.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.04 (= 85.7ms / 82.4ms)
OneFlow resnet50 time: 51.2ms (= 10235.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.3ms (= 11063.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.08 (= 55.3ms / 51.2ms)
OneFlow resnet50 time: 33.9ms (= 6776.3ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.7ms (= 8938.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.32 (= 44.7ms / 33.9ms)
OneFlow resnet50 time: 26.2ms (= 5244.5ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.1ms (= 8013.5ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.53 (= 40.1ms / 26.2ms)
OneFlow swin dataloader time: 0.242s (= 48.437s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.213s / 200, num_workers=1)
Relative speed: 0.624 (= 0.151s / 0.242s)
OneFlow swin dataloader time: 0.065s (= 13.023s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.439s / 200, num_workers=4)
Relative speed: 0.648 (= 0.042s / 0.065s)
OneFlow swin dataloader time: 0.036s (= 7.170s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.392s / 200, num_workers=8)
Relative speed: 0.613 (= 0.022s / 0.036s)
❌ OneFlow resnet50 time: 164.7ms (= 16465.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 173.1ms (= 17309.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.05 (= 173.1ms / 164.7ms)
OneFlow resnet50 time: 103.0ms (= 10297.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 108.4ms (= 10835.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.05 (= 108.4ms / 103.0ms)
OneFlow resnet50 time: 70.9ms (= 14181.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.8ms (= 17569.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 87.8ms / 70.9ms)
OneFlow resnet50 time: 57.0ms (= 11398.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.7ms (= 15335.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 76.7ms / 57.0ms)
OneFlow resnet50 time: 50.9ms (= 10187.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.2ms (= 15833.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 79.2ms / 50.9ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9860/
CI failed when running job: cuda-misc. PR label automerge has been removed
CI failed when running job: cuda-module. PR label automerge has been removed
api docs的截图贴一下吧
贴上了,在一楼
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9860/
CI failed when running job: cuda-module. PR label automerge has been removed
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.0ms (= 14096.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.2ms (= 14319.5ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.02 (= 143.2ms / 141.0ms)
OneFlow resnet50 time: 80.7ms (= 8072.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.9ms (= 8685.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.08 (= 86.9ms / 80.7ms)
OneFlow resnet50 time: 50.0ms (= 9993.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 65.2ms (= 13047.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.31 (= 65.2ms / 50.0ms)
OneFlow resnet50 time: 33.3ms (= 6658.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.3ms (= 9050.0ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.36 (= 45.3ms / 33.3ms)
OneFlow resnet50 time: 25.0ms (= 5005.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 50.5ms (= 10102.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 2.02 (= 50.5ms / 25.0ms)
OneFlow swin dataloader time: 0.244s (= 48.810s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.793s / 200, num_workers=1)
Relative speed: 0.610 (= 0.149s / 0.244s)
OneFlow swin dataloader time: 0.064s (= 12.785s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.367s / 200, num_workers=4)
Relative speed: 0.654 (= 0.042s / 0.064s)
OneFlow swin dataloader time: 0.037s (= 7.375s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.384s / 200, num_workers=8)
Relative speed: 0.594 (= 0.022s / 0.037s)
❌ OneFlow resnet50 time: 152.7ms (= 15265.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.5ms (= 16245.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.06 (= 162.5ms / 152.7ms)
OneFlow resnet50 time: 91.8ms (= 9175.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.9ms (= 10389.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 103.9ms / 91.8ms)
OneFlow resnet50 time: 59.6ms (= 11915.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.1ms (= 15810.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 79.1ms / 59.6ms)
OneFlow resnet50 time: 42.3ms (= 8468.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.7ms (= 15133.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.79 (= 75.7ms / 42.3ms)
OneFlow resnet50 time: 36.0ms (= 7208.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.7ms (= 13746.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.91 (= 68.7ms / 36.0ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9860/
CI failed when running job: cuda-module. PR label automerge has been removed
CI failed when running job: cuda-module. PR label automerge has been removed
CI failed when running job: cuda-module. PR label automerge has been removed
CI failed when running job: cpu-module. PR label automerge has been removed
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.1ms (= 14111.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.8ms (= 14279.1ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.01 (= 142.8ms / 141.1ms)
OneFlow resnet50 time: 81.3ms (= 8133.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.1ms (= 8513.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.05 (= 85.1ms / 81.3ms)
OneFlow resnet50 time: 50.5ms (= 10094.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.3ms (= 11458.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.14 (= 57.3ms / 50.5ms)
OneFlow resnet50 time: 33.7ms (= 6738.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.6ms (= 8915.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.32 (= 44.6ms / 33.7ms)
OneFlow resnet50 time: 25.7ms (= 5141.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.7ms (= 8135.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.58 (= 40.7ms / 25.7ms)
OneFlow swin dataloader time: 0.243s (= 48.590s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.837s / 200, num_workers=1)
Relative speed: 0.614 (= 0.149s / 0.243s)
OneFlow swin dataloader time: 0.069s (= 13.754s / 200, num_workers=4)
PyTorch swin dataloader time: 0.044s (= 8.832s / 200, num_workers=4)
Relative speed: 0.642 (= 0.044s / 0.069s)
OneFlow swin dataloader time: 0.041s (= 8.194s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.522s / 200, num_workers=8)
Relative speed: 0.552 (= 0.023s / 0.041s)
❌ OneFlow resnet50 time: 153.1ms (= 15305.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.4ms (= 16443.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.07 (= 164.4ms / 153.1ms)
OneFlow resnet50 time: 92.1ms (= 9209.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.6ms (= 10363.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 103.6ms / 92.1ms)
OneFlow resnet50 time: 60.8ms (= 12156.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.6ms (= 16123.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 80.6ms / 60.8ms)
OneFlow resnet50 time: 42.4ms (= 8485.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14200.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 71.0ms / 42.4ms)
OneFlow resnet50 time: 35.7ms (= 7148.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.1ms (= 13629.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.91 (= 68.1ms / 35.7ms)
Speed stats:
GPU Name: NVIDIA GeForce GTX 1080
❌ OneFlow resnet50 time: 141.5ms (= 14150.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 144.5ms (= 14447.7ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.02 (= 144.5ms / 141.5ms)
OneFlow resnet50 time: 82.1ms (= 8213.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.8ms (= 8783.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.07 (= 87.8ms / 82.1ms)
OneFlow resnet50 time: 51.3ms (= 10261.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 60.0ms (= 12004.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.17 (= 60.0ms / 51.3ms)
OneFlow resnet50 time: 34.1ms (= 6819.9ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.3ms (= 8668.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.27 (= 43.3ms / 34.1ms)
OneFlow resnet50 time: 25.9ms (= 5170.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.1ms (= 7220.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.40 (= 36.1ms / 25.9ms)
OneFlow swin dataloader time: 0.233s (= 46.626s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.177s / 200, num_workers=1)
Relative speed: 0.647 (= 0.151s / 0.233s)
OneFlow swin dataloader time: 0.067s (= 13.453s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.117s / 200, num_workers=4)
Relative speed: 0.603 (= 0.041s / 0.067s)
OneFlow swin dataloader time: 0.041s (= 8.284s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.427s / 200, num_workers=8)
Relative speed: 0.534 (= 0.022s / 0.041s)
❌ OneFlow resnet50 time: 164.1ms (= 16413.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 172.9ms (= 17289.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.05 (= 172.9ms / 164.1ms)
OneFlow resnet50 time: 103.1ms (= 10305.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 113.7ms (= 11368.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.10 (= 113.7ms / 103.1ms)
OneFlow resnet50 time: 70.7ms (= 14134.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.9ms (= 17579.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 87.9ms / 70.7ms)
OneFlow resnet50 time: 57.1ms (= 11413.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.7ms (= 14930.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 74.7ms / 57.1ms)
OneFlow resnet50 time: 50.8ms (= 10167.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.6ms (= 13928.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 69.6ms / 50.8ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9860/
Speed stats:
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.4ms (= 14137.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 145.7ms (= 14570.6ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.03 (= 145.7ms / 141.4ms)
OneFlow resnet50 time: 83.5ms (= 8351.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.6ms (= 8755.8ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.05 (= 87.6ms / 83.5ms)
OneFlow resnet50 time: 51.5ms (= 10297.5ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.5ms (= 11499.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.12 (= 57.5ms / 51.5ms)
OneFlow resnet50 time: 34.7ms (= 6938.3ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.1ms (= 8212.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.18 (= 41.1ms / 34.7ms)
OneFlow resnet50 time: 27.0ms (= 5393.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 45.6ms (= 9120.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.69 (= 45.6ms / 27.0ms)
OneFlow swin dataloader time: 0.234s (= 46.821s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.108s / 200, num_workers=1)
Relative speed: 0.643 (= 0.151s / 0.234s)
OneFlow swin dataloader time: 0.068s (= 13.580s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.616s / 200, num_workers=4)
Relative speed: 0.634 (= 0.043s / 0.068s)
OneFlow swin dataloader time: 0.040s (= 8.095s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.519s / 200, num_workers=8)
Relative speed: 0.558 (= 0.023s / 0.040s)
❌ OneFlow resnet50 time: 154.2ms (= 15423.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 165.6ms (= 16559.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.07 (= 165.6ms / 154.2ms)
OneFlow resnet50 time: 94.0ms (= 9396.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.0ms (= 10403.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.11 (= 104.0ms / 94.0ms)
OneFlow resnet50 time: 60.7ms (= 12130.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.9ms (= 15989.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 79.9ms / 60.7ms)
OneFlow resnet50 time: 42.9ms (= 8587.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14200.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.65 (= 71.0ms / 42.9ms)
OneFlow resnet50 time: 38.1ms (= 7616.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.9ms (= 14588.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.92 (= 72.9ms / 38.1ms)
CI failed when running job: cuda-module. PR label automerge has been removed
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.0ms (= 14098.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 144.3ms (= 14428.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.02 (= 144.3ms / 141.0ms)
OneFlow resnet50 time: 81.1ms (= 8114.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.0ms (= 8599.8ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.06 (= 86.0ms / 81.1ms)
OneFlow resnet50 time: 50.5ms (= 10090.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.4ms (= 11683.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.16 (= 58.4ms / 50.5ms)
OneFlow resnet50 time: 33.7ms (= 6739.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.0ms (= 8603.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.28 (= 43.0ms / 33.7ms)
OneFlow resnet50 time: 25.2ms (= 5046.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.8ms (= 8363.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.66 (= 41.8ms / 25.2ms)
OneFlow swin dataloader time: 0.235s (= 47.096s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.773s / 200, num_workers=1)
Relative speed: 0.632 (= 0.149s / 0.235s)
OneFlow swin dataloader time: 0.068s (= 13.508s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.514s / 200, num_workers=4)
Relative speed: 0.630 (= 0.043s / 0.068s)
OneFlow swin dataloader time: 0.041s (= 8.102s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.672s / 200, num_workers=8)
Relative speed: 0.577 (= 0.023s / 0.041s)
❌ OneFlow resnet50 time: 152.9ms (= 15292.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.2ms (= 16325.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.07 (= 163.2ms / 152.9ms)
OneFlow resnet50 time: 92.7ms (= 9268.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.3ms (= 10325.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.11 (= 103.3ms / 92.7ms)
OneFlow resnet50 time: 60.4ms (= 12081.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.7ms (= 15735.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 78.7ms / 60.4ms)
OneFlow resnet50 time: 42.5ms (= 8502.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.4ms (= 13887.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.63 (= 69.4ms / 42.5ms)
OneFlow resnet50 time: 36.8ms (= 7352.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.0ms (= 13802.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.88 (= 69.0ms / 36.8ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9860/
CI failed when running job: cuda-module. PR label automerge has been removed
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.1ms (= 14110.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 145.6ms (= 14562.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.03 (= 145.6ms / 141.1ms)
OneFlow resnet50 time: 81.7ms (= 8165.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.5ms (= 8651.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.06 (= 86.5ms / 81.7ms)
OneFlow resnet50 time: 50.5ms (= 10106.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.6ms (= 11711.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.16 (= 58.6ms / 50.5ms)
OneFlow resnet50 time: 33.4ms (= 6681.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 46.5ms (= 9304.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.39 (= 46.5ms / 33.4ms)
OneFlow resnet50 time: 25.7ms (= 5146.1ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.4ms (= 7476.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.45 (= 37.4ms / 25.7ms)
OneFlow swin dataloader time: 0.236s (= 47.212s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.088s / 200, num_workers=1)
Relative speed: 0.637 (= 0.150s / 0.236s)
OneFlow swin dataloader time: 0.067s (= 13.317s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.284s / 200, num_workers=4)
Relative speed: 0.622 (= 0.041s / 0.067s)
OneFlow swin dataloader time: 0.039s (= 7.886s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.533s / 200, num_workers=8)
Relative speed: 0.575 (= 0.023s / 0.039s)
❌ OneFlow resnet50 time: 152.5ms (= 15254.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.2ms (= 16417.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.08 (= 164.2ms / 152.5ms)
OneFlow resnet50 time: 93.2ms (= 9323.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.1ms (= 10413.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 104.1ms / 93.2ms)
OneFlow resnet50 time: 61.0ms (= 12202.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.5ms (= 17497.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.43 (= 87.5ms / 61.0ms)
OneFlow resnet50 time: 42.9ms (= 8575.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.9ms (= 14172.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.65 (= 70.9ms / 42.9ms)
OneFlow resnet50 time: 35.6ms (= 7114.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.9ms (= 13775.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.94 (= 68.9ms / 35.6ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9860/
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.3ms (= 14132.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 144.6ms (= 14457.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.02 (= 144.6ms / 141.3ms)
OneFlow resnet50 time: 82.8ms (= 8278.9ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.8ms (= 8783.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.06 (= 87.8ms / 82.8ms)
OneFlow resnet50 time: 51.7ms (= 10330.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 61.3ms (= 12267.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.19 (= 61.3ms / 51.7ms)
OneFlow resnet50 time: 33.9ms (= 6785.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.4ms (= 8481.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.25 (= 42.4ms / 33.9ms)
OneFlow resnet50 time: 26.6ms (= 5323.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.1ms (= 7627.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.43 (= 38.1ms / 26.6ms)
OneFlow swin dataloader time: 0.238s (= 47.665s / 200, num_workers=1)
PyTorch swin dataloader time: 0.148s (= 29.556s / 200, num_workers=1)
Relative speed: 0.620 (= 0.148s / 0.238s)
OneFlow swin dataloader time: 0.068s (= 13.687s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.490s / 200, num_workers=4)
Relative speed: 0.620 (= 0.042s / 0.068s)
OneFlow swin dataloader time: 0.043s (= 8.649s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.414s / 200, num_workers=8)
Relative speed: 0.510 (= 0.022s / 0.043s)
❌ OneFlow resnet50 time: 153.1ms (= 15314.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.5ms (= 16450.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.07 (= 164.5ms / 153.1ms)
OneFlow resnet50 time: 93.3ms (= 9331.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.8ms (= 10381.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.11 (= 103.8ms / 93.3ms)
OneFlow resnet50 time: 61.3ms (= 12251.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.3ms (= 15465.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 77.3ms / 61.3ms)
OneFlow resnet50 time: 43.1ms (= 8619.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.6ms (= 14113.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.64 (= 70.6ms / 43.1ms)
OneFlow resnet50 time: 37.1ms (= 7411.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.0ms (= 13808.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.86 (= 69.0ms / 37.1ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9860/
CI failed when running job: cuda-module. PR label automerge has been removed