oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Add cdist op

Open marigoold opened this issue 2 years ago • 19 comments

cdist 对于两个输入 x1 (shape=[B, R1, C]),x2 (shape=[B, R2, C]),计算每个 batch 内 x1 和 x2 每一行向量之间距离的p范数,得到结果 result (shape=[B, R1, R2])。

torch 文档见 https://pytorch.org/docs/stable/generated/torch.cdist.html

marigoold avatar Nov 08 '22 04:11 marigoold

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Nov 18 '22 09:11 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Nov 18 '22 09:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 18 '22 11:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 19 '22 05:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 20 '22 11:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 20 '22 14:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 21 '22 02:11 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 









❌ OneFlow resnet50 time: 152.2ms (= 15221.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 171.8ms (= 17176.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 171.8ms / 152.2ms)

OneFlow resnet50 time: 96.7ms (= 9672.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.1ms (= 11211.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 112.1ms / 96.7ms)

OneFlow resnet50 time: 68.8ms (= 13767.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.7ms (= 17549.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 87.7ms / 68.8ms)

OneFlow resnet50 time: 60.1ms (= 12026.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.7ms (= 14932.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 74.7ms / 60.1ms)

OneFlow resnet50 time: 54.9ms (= 10988.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.3ms (= 13863.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 69.3ms / 54.9ms)

github-actions[bot] avatar Nov 21 '22 03:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9391/

github-actions[bot] avatar Nov 21 '22 04:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 23 '22 02:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 23 '22 12:11 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 









❌ OneFlow resnet50 time: 153.9ms (= 15389.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 172.1ms (= 17205.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 172.1ms / 153.9ms)

OneFlow resnet50 time: 96.7ms (= 9669.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.7ms (= 11267.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 112.7ms / 96.7ms)

OneFlow resnet50 time: 69.1ms (= 13814.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 90.4ms (= 18085.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 90.4ms / 69.1ms)

OneFlow resnet50 time: 60.9ms (= 12173.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.4ms (= 14876.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 74.4ms / 60.9ms)

OneFlow resnet50 time: 55.1ms (= 11027.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.7ms (= 14540.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 72.7ms / 55.1ms)

github-actions[bot] avatar Nov 28 '22 06:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.7ms (= 14168.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.2ms (= 16320.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 163.2ms / 141.7ms)

OneFlow resnet50 time: 86.0ms (= 8600.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10228.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 102.3ms / 86.0ms)

OneFlow resnet50 time: 57.8ms (= 11553.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15546.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.7ms / 57.8ms)

OneFlow resnet50 time: 45.7ms (= 9148.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.8ms (= 13953.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 69.8ms / 45.7ms)

OneFlow resnet50 time: 40.0ms (= 8008.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.6ms (= 14121.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 70.6ms / 40.0ms)

github-actions[bot] avatar Nov 28 '22 14:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9391/

github-actions[bot] avatar Nov 28 '22 14:11 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.1ms (= 14007.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.8ms (= 16280.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.8ms / 140.1ms)

OneFlow resnet50 time: 85.4ms (= 8542.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.3ms (= 10134.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.3ms / 85.4ms)

OneFlow resnet50 time: 57.9ms (= 11576.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.8ms (= 17563.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 87.8ms / 57.9ms)

OneFlow resnet50 time: 44.4ms (= 8875.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.1ms (= 14217.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.60 (= 71.1ms / 44.4ms)

OneFlow resnet50 time: 39.5ms (= 7900.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13573.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 67.9ms / 39.5ms)

github-actions[bot] avatar Dec 15 '22 07:12 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9391/

github-actions[bot] avatar Dec 15 '22 08:12 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Jan 11 '23 11:01 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.2ms (= 14121.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.9ms (= 14286.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.01 (= 142.9ms / 141.2ms)

OneFlow resnet50 time: 81.4ms (= 8144.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.5ms (= 8652.3ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.06 (= 86.5ms / 81.4ms)

OneFlow resnet50 time: 51.0ms (= 10201.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.2ms (= 12442.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.22 (= 62.2ms / 51.0ms)

OneFlow resnet50 time: 33.6ms (= 6727.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.5ms (= 9104.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.35 (= 45.5ms / 33.6ms)

OneFlow resnet50 time: 26.5ms (= 5299.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.8ms (= 8354.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.58 (= 41.8ms / 26.5ms)

OneFlow swin dataloader time: 0.245s (= 49.045s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.112s / 200, num_workers=1)
Relative speed: 0.614 (= 0.151s / 0.245s)

OneFlow swin dataloader time: 0.067s (= 13.400s / 200, num_workers=4)
PyTorch swin dataloader time: 0.039s (= 7.885s / 200, num_workers=4)
Relative speed: 0.588 (= 0.039s / 0.067s)

OneFlow swin dataloader time: 0.040s (= 7.985s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.466s / 200, num_workers=8)
Relative speed: 0.559 (= 0.022s / 0.040s)

❌ OneFlow resnet50 time: 152.8ms (= 15280.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 166.9ms (= 16694.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.09 (= 166.9ms / 152.8ms)

OneFlow resnet50 time: 92.3ms (= 9228.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.9ms (= 10390.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 103.9ms / 92.3ms)

OneFlow resnet50 time: 60.2ms (= 12033.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.5ms (= 17696.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.47 (= 88.5ms / 60.2ms)

OneFlow resnet50 time: 42.2ms (= 8442.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14207.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 71.0ms / 42.2ms)

OneFlow resnet50 time: 37.4ms (= 7484.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.8ms (= 14767.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.97 (= 73.8ms / 37.4ms)

github-actions[bot] avatar Feb 17 '23 13:02 github-actions[bot]

Speed stats:

github-actions[bot] avatar Apr 11 '23 05:04 github-actions[bot]