oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Npu test

Open zkyseu opened this issue 1 year ago • 9 comments

华为昇腾910对oneflow源码的修改,与机器能够适配

zkyseu avatar Nov 26 '23 15:11 zkyseu

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 26 '23 15:11 CLAassistant

View latest API docs preview at: https://oneflow-staging.oss-cn-beijing.aliyuncs.com/docs/Oneflow-Inc/oneflow/pr/10358/

github-actions[bot] avatar Nov 26 '23 15:11 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.4ms (= 4340.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.5ms (= 5747.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.5ms / 43.4ms)

OneFlow resnet50 time: 26.0ms (= 2604.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.1ms (= 3807.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.46 (= 38.1ms / 26.0ms)

OneFlow resnet50 time: 18.6ms (= 3718.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 34.8ms (= 6967.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.87 (= 34.8ms / 18.6ms)

OneFlow resnet50 time: 17.9ms (= 3580.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.2ms (= 6236.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.74 (= 31.2ms / 17.9ms)

OneFlow resnet50 time: 17.4ms (= 3480.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.7ms (= 5932.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.70 (= 29.7ms / 17.4ms)

OneFlow swin dataloader time: 0.200s (= 40.026s / 200, num_workers=1)
PyTorch swin dataloader time: 0.127s (= 25.421s / 200, num_workers=1)
Relative speed: 0.635 (= 0.127s / 0.200s)

OneFlow swin dataloader time: 0.056s (= 11.150s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.559s / 200, num_workers=4)
Relative speed: 0.588 (= 0.033s / 0.056s)

OneFlow swin dataloader time: 0.031s (= 6.135s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.308s / 200, num_workers=8)
Relative speed: 0.539 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 47.7ms (= 4767.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 62.9ms (= 6285.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 62.9ms / 47.7ms)

OneFlow resnet50 time: 31.3ms (= 3131.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.2ms (= 4621.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.48 (= 46.2ms / 31.3ms)

OneFlow resnet50 time: 23.8ms (= 4753.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 43.0ms (= 8592.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.81 (= 43.0ms / 23.8ms)

OneFlow resnet50 time: 21.5ms (= 4300.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 37.6ms (= 7525.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.75 (= 37.6ms / 21.5ms)

OneFlow resnet50 time: 20.8ms (= 4166.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.1ms (= 7019.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 35.1ms / 20.8ms)

github-actions[bot] avatar Nov 26 '23 15:11 github-actions[bot]

这个表示可以在昇腾机器上编译,还不能运行吧?

yuanms2 avatar Nov 27 '23 00:11 yuanms2

这个表示可以在昇腾机器上编译,还不能运行吧?

@yuanms2 目前测试oneflow lite是可以正常推理的,但是oneflow进行模型训练没有进行测试。

zkyseu avatar Nov 27 '23 01:11 zkyseu

View latest API docs preview at: https://oneflow-staging.oss-cn-beijing.aliyuncs.com/docs/Oneflow-Inc/oneflow/pr/10358/

github-actions[bot] avatar Nov 27 '23 02:11 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 44.0ms (= 4399.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 61.5ms (= 6149.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.40 (= 61.5ms / 44.0ms)

OneFlow resnet50 time: 26.5ms (= 2652.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.4ms (= 3738.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.41 (= 37.4ms / 26.5ms)

OneFlow resnet50 time: 18.5ms (= 3696.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.5ms (= 7292.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.97 (= 36.5ms / 18.5ms)

OneFlow resnet50 time: 17.6ms (= 3514.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 30.8ms (= 6164.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.75 (= 30.8ms / 17.6ms)

OneFlow resnet50 time: 17.0ms (= 3405.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.4ms (= 5683.6ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.67 (= 28.4ms / 17.0ms)

OneFlow swin dataloader time: 0.200s (= 40.057s / 200, num_workers=1)
PyTorch swin dataloader time: 0.127s (= 25.423s / 200, num_workers=1)
Relative speed: 0.635 (= 0.127s / 0.200s)

OneFlow swin dataloader time: 0.054s (= 10.824s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.495s / 200, num_workers=4)
Relative speed: 0.600 (= 0.032s / 0.054s)

OneFlow swin dataloader time: 0.030s (= 6.050s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.429s / 200, num_workers=8)
Relative speed: 0.567 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 47.7ms (= 4766.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.0ms (= 6499.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 65.0ms / 47.7ms)

OneFlow resnet50 time: 32.0ms (= 3204.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 44.0ms (= 4404.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 44.0ms / 32.0ms)

OneFlow resnet50 time: 23.7ms (= 4738.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.3ms (= 8265.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 41.3ms / 23.7ms)

OneFlow resnet50 time: 21.1ms (= 4224.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.1ms (= 7228.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.71 (= 36.1ms / 21.1ms)

OneFlow resnet50 time: 20.7ms (= 4134.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 34.2ms (= 6830.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.65 (= 34.2ms / 20.7ms)

github-actions[bot] avatar Nov 27 '23 02:11 github-actions[bot]

CI failed when running job: cuda-speed-test. PR label automerge has been removed

github-actions[bot] avatar Nov 27 '23 02:11 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.5ms (= 4348.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.0ms (= 5700.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.31 (= 57.0ms / 43.5ms)

OneFlow resnet50 time: 26.5ms (= 2651.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.2ms (= 3820.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.44 (= 38.2ms / 26.5ms)

OneFlow resnet50 time: 19.1ms (= 3824.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.7ms (= 7144.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.87 (= 35.7ms / 19.1ms)

OneFlow resnet50 time: 17.6ms (= 3524.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 30.9ms (= 6177.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.75 (= 30.9ms / 17.6ms)

OneFlow resnet50 time: 17.8ms (= 3550.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.3ms (= 5659.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.59 (= 28.3ms / 17.8ms)

OneFlow swin dataloader time: 0.200s (= 40.090s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.618s / 200, num_workers=1)
Relative speed: 0.639 (= 0.128s / 0.200s)

OneFlow swin dataloader time: 0.055s (= 10.909s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.526s / 200, num_workers=4)
Relative speed: 0.598 (= 0.033s / 0.055s)

OneFlow swin dataloader time: 0.030s (= 6.023s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.375s / 200, num_workers=8)
Relative speed: 0.560 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 47.8ms (= 4778.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.1ms (= 6410.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 64.1ms / 47.8ms)

OneFlow resnet50 time: 33.2ms (= 3322.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 48.1ms (= 4810.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 48.1ms / 33.2ms)

OneFlow resnet50 time: 23.5ms (= 4699.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.3ms (= 8259.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 41.3ms / 23.5ms)

OneFlow resnet50 time: 20.7ms (= 4141.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.1ms (= 7221.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 36.1ms / 20.7ms)

OneFlow resnet50 time: 20.3ms (= 4050.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 34.0ms (= 6797.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 34.0ms / 20.3ms)

github-actions[bot] avatar Nov 27 '23 11:11 github-actions[bot]