oneflow
oneflow copied to clipboard
Add cdist op
cdist 对于两个输入 x1 (shape=[B, R1, C]),x2 (shape=[B, R2, C]),计算每个 batch 内 x1 和 x2 每一行向量之间距离的p范数,得到结果 result (shape=[B, R1, R2])。
torch 文档见 https://pytorch.org/docs/stable/generated/torch.cdist.html
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
Speed stats:
Speed stats:
Speed stats:
Speed stats:
Speed stats:
GPU Name: NVIDIA GeForce GTX 1080
❌ OneFlow resnet50 time: 152.2ms (= 15221.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 171.8ms (= 17176.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 171.8ms / 152.2ms)
OneFlow resnet50 time: 96.7ms (= 9672.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.1ms (= 11211.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 112.1ms / 96.7ms)
OneFlow resnet50 time: 68.8ms (= 13767.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.7ms (= 17549.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 87.7ms / 68.8ms)
OneFlow resnet50 time: 60.1ms (= 12026.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.7ms (= 14932.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 74.7ms / 60.1ms)
OneFlow resnet50 time: 54.9ms (= 10988.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.3ms (= 13863.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 69.3ms / 54.9ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9391/
Speed stats:
Speed stats:
Speed stats:
GPU Name: NVIDIA GeForce GTX 1080
❌ OneFlow resnet50 time: 153.9ms (= 15389.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 172.1ms (= 17205.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 172.1ms / 153.9ms)
OneFlow resnet50 time: 96.7ms (= 9669.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.7ms (= 11267.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 112.7ms / 96.7ms)
OneFlow resnet50 time: 69.1ms (= 13814.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 90.4ms (= 18085.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 90.4ms / 69.1ms)
OneFlow resnet50 time: 60.9ms (= 12173.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.4ms (= 14876.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 74.4ms / 60.9ms)
OneFlow resnet50 time: 55.1ms (= 11027.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.7ms (= 14540.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 72.7ms / 55.1ms)
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.7ms (= 14168.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.2ms (= 16320.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 163.2ms / 141.7ms)
OneFlow resnet50 time: 86.0ms (= 8600.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10228.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 102.3ms / 86.0ms)
OneFlow resnet50 time: 57.8ms (= 11553.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15546.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.7ms / 57.8ms)
OneFlow resnet50 time: 45.7ms (= 9148.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.8ms (= 13953.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 69.8ms / 45.7ms)
OneFlow resnet50 time: 40.0ms (= 8008.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.6ms (= 14121.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 70.6ms / 40.0ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9391/
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 140.1ms (= 14007.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.8ms (= 16280.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.8ms / 140.1ms)
OneFlow resnet50 time: 85.4ms (= 8542.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.3ms (= 10134.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.3ms / 85.4ms)
OneFlow resnet50 time: 57.9ms (= 11576.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.8ms (= 17563.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 87.8ms / 57.9ms)
OneFlow resnet50 time: 44.4ms (= 8875.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.1ms (= 14217.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.60 (= 71.1ms / 44.4ms)
OneFlow resnet50 time: 39.5ms (= 7900.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13573.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 67.9ms / 39.5ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9391/
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.2ms (= 14121.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.9ms (= 14286.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.01 (= 142.9ms / 141.2ms)
OneFlow resnet50 time: 81.4ms (= 8144.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.5ms (= 8652.3ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.06 (= 86.5ms / 81.4ms)
OneFlow resnet50 time: 51.0ms (= 10201.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.2ms (= 12442.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.22 (= 62.2ms / 51.0ms)
OneFlow resnet50 time: 33.6ms (= 6727.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.5ms (= 9104.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.35 (= 45.5ms / 33.6ms)
OneFlow resnet50 time: 26.5ms (= 5299.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.8ms (= 8354.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.58 (= 41.8ms / 26.5ms)
OneFlow swin dataloader time: 0.245s (= 49.045s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.112s / 200, num_workers=1)
Relative speed: 0.614 (= 0.151s / 0.245s)
OneFlow swin dataloader time: 0.067s (= 13.400s / 200, num_workers=4)
PyTorch swin dataloader time: 0.039s (= 7.885s / 200, num_workers=4)
Relative speed: 0.588 (= 0.039s / 0.067s)
OneFlow swin dataloader time: 0.040s (= 7.985s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.466s / 200, num_workers=8)
Relative speed: 0.559 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 152.8ms (= 15280.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 166.9ms (= 16694.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.09 (= 166.9ms / 152.8ms)
OneFlow resnet50 time: 92.3ms (= 9228.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.9ms (= 10390.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 103.9ms / 92.3ms)
OneFlow resnet50 time: 60.2ms (= 12033.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.5ms (= 17696.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.47 (= 88.5ms / 60.2ms)
OneFlow resnet50 time: 42.2ms (= 8442.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14207.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 71.0ms / 42.2ms)
OneFlow resnet50 time: 37.4ms (= 7484.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.8ms (= 14767.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.97 (= 73.8ms / 37.4ms)
Speed stats: