oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

flash attention use mirror

Open mosout opened this issue 1 year ago • 3 comments

mosout avatar May 22 '24 03:05 mosout

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar May 22 '24 03:05 github-actions[bot]

View latest API docs preview at: https://oneflow-staging.oss-cn-beijing.aliyuncs.com/docs/Oneflow-Inc/oneflow/pr/10530/

github-actions[bot] avatar May 22 '24 03:05 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.5ms (= 4345.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.9ms (= 5792.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.33 (= 57.9ms / 43.5ms)

OneFlow resnet50 time: 26.4ms (= 2640.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.7ms (= 3767.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.43 (= 37.7ms / 26.4ms)

OneFlow resnet50 time: 18.3ms (= 3657.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.0ms (= 7192.4ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.97 (= 36.0ms / 18.3ms)

OneFlow resnet50 time: 17.2ms (= 3443.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 33.8ms (= 6764.0ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.96 (= 33.8ms / 17.2ms)

OneFlow resnet50 time: 16.9ms (= 3380.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 30.3ms (= 6065.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.79 (= 30.3ms / 16.9ms)

OneFlow swin dataloader time: 0.202s (= 40.387s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.618s / 200, num_workers=1)
Relative speed: 0.634 (= 0.128s / 0.202s)

OneFlow swin dataloader time: 0.063s (= 12.527s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.515s / 200, num_workers=4)
Relative speed: 0.520 (= 0.033s / 0.063s)

OneFlow swin dataloader time: 0.032s (= 6.326s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.325s / 200, num_workers=8)
Relative speed: 0.526 (= 0.017s / 0.032s)

❌ OneFlow resnet50 time: 49.3ms (= 4933.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.6ms (= 6557.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 65.6ms / 49.3ms)

OneFlow resnet50 time: 36.4ms (= 3638.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 45.7ms (= 4570.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 45.7ms / 36.4ms)

OneFlow resnet50 time: 27.9ms (= 5582.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 42.5ms (= 8506.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 42.5ms / 27.9ms)

OneFlow resnet50 time: 25.1ms (= 5024.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.4ms (= 7672.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 38.4ms / 25.1ms)

OneFlow resnet50 time: 24.8ms (= 4957.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.2ms (= 7236.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.46 (= 36.2ms / 24.8ms)

github-actions[bot] avatar May 22 '24 04:05 github-actions[bot]