oneflow
oneflow copied to clipboard
Inplace masked fill

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
flow.masked_fill() 接口需要包一下
好的,我看 torch 里只有 torch.Tensor.masked_fill_, 所以这个 flow 下的接口,我也只导出,不添加文档了哈。
Speed stats:
CI failed when running job: cpu-module. PR label automerge has been removed
Speed stats:
Speed stats:
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9133/
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.6ms (= 13964.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.7ms (= 16069.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.7ms / 139.6ms)
OneFlow resnet50 time: 85.4ms (= 8543.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.7ms (= 10274.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.7ms / 85.4ms)
OneFlow resnet50 time: 58.2ms (= 11646.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.0ms (= 15596.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 78.0ms / 58.2ms)
OneFlow resnet50 time: 44.7ms (= 8943.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.0ms (= 15007.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 75.0ms / 44.7ms)
OneFlow resnet50 time: 40.5ms (= 8094.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.8ms (= 13552.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 67.8ms / 40.5ms)
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 140.6ms (= 14055.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.5ms (= 16352.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 163.5ms / 140.6ms)
OneFlow resnet50 time: 85.9ms (= 8590.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.2ms (= 10120.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 101.2ms / 85.9ms)
OneFlow resnet50 time: 58.3ms (= 11659.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.3ms (= 15653.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 78.3ms / 58.3ms)
OneFlow resnet50 time: 45.3ms (= 9061.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.4ms (= 14078.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 70.4ms / 45.3ms)
OneFlow resnet50 time: 40.2ms (= 8041.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15542.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.93 (= 77.7ms / 40.2ms)
CI failed when running job: Build cu102. PR label automerge has been removed
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9133/
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.6ms (= 13960.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.5ms (= 16049.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.5ms / 139.6ms)
OneFlow resnet50 time: 85.7ms (= 8566.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.4ms (= 10437.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 104.4ms / 85.7ms)
OneFlow resnet50 time: 58.1ms (= 11614.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.8ms (= 17561.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 87.8ms / 58.1ms)
OneFlow resnet50 time: 45.4ms (= 9087.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.8ms (= 14151.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 70.8ms / 45.4ms)
OneFlow resnet50 time: 40.3ms (= 8050.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.8ms (= 13752.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.71 (= 68.8ms / 40.3ms)