oneflow
oneflow copied to clipboard
Refactor uniform initializer
重构 uniform 相关的 initializer,使用 oneflow 自己的 kernel,加速初始化
可以把 uniform_ 这个方法也在 Python C api 里面绑定到 tensor.uniform_ 上(不过这样的话,那个 assert 就得放在 functor 里面了)
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
CI failed when running job: Build cpu. PR label automerge has been removed
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.5ms (= 13949.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.6ms (= 16060.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.6ms / 139.5ms)
OneFlow resnet50 time: 84.8ms (= 8479.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 100.5ms (= 10049.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 100.5ms / 84.8ms)
OneFlow resnet50 time: 57.6ms (= 11525.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.2ms (= 15633.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.2ms / 57.6ms)
OneFlow resnet50 time: 44.7ms (= 8937.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.9ms (= 15977.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.79 (= 79.9ms / 44.7ms)
OneFlow resnet50 time: 40.5ms (= 8106.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.6ms (= 13521.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 67.6ms / 40.5ms)
Speed stats:
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.6ms (= 13963.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 167.8ms (= 16775.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 167.8ms / 139.6ms)
OneFlow resnet50 time: 85.3ms (= 8527.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.6ms (= 10155.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.6ms / 85.3ms)
OneFlow resnet50 time: 58.3ms (= 11654.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.6ms (= 17721.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 88.6ms / 58.3ms)
OneFlow resnet50 time: 44.7ms (= 8932.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.4ms (= 13877.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 69.4ms / 44.7ms)
OneFlow resnet50 time: 39.7ms (= 7940.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.3ms (= 15064.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.90 (= 75.3ms / 39.7ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9384/
CI failed when running job: cuda-misc. PR label automerge has been removed
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.7ms (= 13965.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 167.1ms (= 16709.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 167.1ms / 139.7ms)
OneFlow resnet50 time: 84.7ms (= 8468.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.3ms (= 10133.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 101.3ms / 84.7ms)
OneFlow resnet50 time: 57.8ms (= 11562.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 86.0ms (= 17197.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.49 (= 86.0ms / 57.8ms)
OneFlow resnet50 time: 44.5ms (= 8893.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.1ms (= 14012.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 70.1ms / 44.5ms)
OneFlow resnet50 time: 41.3ms (= 8267.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.8ms (= 15550.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.88 (= 77.8ms / 41.3ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9384/