oneflow
oneflow copied to clipboard
support named tuple for return type
背景:https://github.com/Oneflow-Inc/OneCloud/issues/147#issuecomment-1195307571
概述: 在 functional_api.yaml 中,支持了一种新的写法:
"TensorTuple<values, indices> (Tensor input, Int32 dim, Bool keepdim=False) => Max",
来表明返回值是一个 named tuple,第一个元素的 name 是 values,第二个元素的 name 是 indices
实现:
在生成 pybind 接口的时候,判断 tensortuple 是否有 name,如果是的话,则生成对应的 named tuple 类,然后在 signature 里面调用 WrapTensorTuple 把结果转成 named tuple。
生成的代码见:https://gist.github.com/marigoold/9ef16485360ef9b2eb86c43ee89df1e8
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.4ms (= 13940.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.6ms (= 16163.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 161.6ms / 139.4ms)
OneFlow resnet50 time: 85.1ms (= 8505.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.0ms (= 10101.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.0ms / 85.1ms)
OneFlow resnet50 time: 57.6ms (= 11512.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.5ms (= 15505.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.5ms / 57.6ms)
OneFlow resnet50 time: 43.8ms (= 8752.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.5ms (= 14296.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.63 (= 71.5ms / 43.8ms)
OneFlow resnet50 time: 41.1ms (= 8226.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.2ms (= 13231.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 66.2ms / 41.1ms)
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.6ms (= 13961.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.4ms (= 16038.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.4ms / 139.6ms)
OneFlow resnet50 time: 84.7ms (= 8467.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 111.2ms (= 11119.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 111.2ms / 84.7ms)
OneFlow resnet50 time: 57.7ms (= 11545.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.5ms (= 15497.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 77.5ms / 57.7ms)
OneFlow resnet50 time: 45.2ms (= 9032.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.4ms (= 13885.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 69.4ms / 45.2ms)
OneFlow resnet50 time: 39.0ms (= 7799.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13595.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 68.0ms / 39.0ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9363/
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.7ms (= 13973.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.2ms (= 16017.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.2ms / 139.7ms)
OneFlow resnet50 time: 85.1ms (= 8510.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.6ms (= 10260.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.6ms / 85.1ms)
OneFlow resnet50 time: 57.8ms (= 11552.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.8ms (= 16167.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 80.8ms / 57.8ms)
OneFlow resnet50 time: 44.5ms (= 8899.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.6ms (= 14126.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 70.6ms / 44.5ms)
OneFlow resnet50 time: 42.5ms (= 8491.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.7ms (= 13741.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.62 (= 68.7ms / 42.5ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9363/
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.9ms (= 14190.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 165.3ms (= 16534.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 165.3ms / 141.9ms)
OneFlow resnet50 time: 85.7ms (= 8566.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10232.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 102.3ms / 85.7ms)
OneFlow resnet50 time: 58.0ms (= 11594.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.0ms (= 15601.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.0ms / 58.0ms)
OneFlow resnet50 time: 43.9ms (= 8778.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 81.4ms (= 16285.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.86 (= 81.4ms / 43.9ms)
OneFlow resnet50 time: 42.1ms (= 8413.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13605.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.62 (= 68.0ms / 42.1ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9363/
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 140.8ms (= 14078.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 165.2ms (= 16519.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 165.2ms / 140.8ms)
OneFlow resnet50 time: 85.3ms (= 8533.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.7ms (= 10174.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.7ms / 85.3ms)
OneFlow resnet50 time: 57.9ms (= 11587.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 84.0ms (= 16797.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 84.0ms / 57.9ms)
OneFlow resnet50 time: 44.6ms (= 8920.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 82.1ms (= 16416.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.84 (= 82.1ms / 44.6ms)
OneFlow resnet50 time: 40.5ms (= 8103.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13585.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 67.9ms / 40.5ms)
Speed stats:
Speed stats:
Speed stats:
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 141.5ms (= 14152.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 166.9ms (= 16691.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 166.9ms / 141.5ms)
OneFlow resnet50 time: 86.7ms (= 8666.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.3ms (= 10329.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 103.3ms / 86.7ms)
OneFlow resnet50 time: 58.2ms (= 11642.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 81.7ms (= 16346.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 81.7ms / 58.2ms)
OneFlow resnet50 time: 45.1ms (= 9021.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.2ms (= 14448.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.60 (= 72.2ms / 45.1ms)
OneFlow resnet50 time: 39.8ms (= 7967.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.4ms (= 13481.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.69 (= 67.4ms / 39.8ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9363/
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9363/
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 140.4ms (= 14036.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.2ms (= 16317.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 163.2ms / 140.4ms)
OneFlow resnet50 time: 85.3ms (= 8532.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.8ms (= 10175.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.8ms / 85.3ms)
OneFlow resnet50 time: 57.9ms (= 11588.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.3ms (= 15653.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.3ms / 57.9ms)
OneFlow resnet50 time: 45.3ms (= 9068.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.9ms (= 14378.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 71.9ms / 45.3ms)
OneFlow resnet50 time: 40.1ms (= 8022.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.4ms (= 13675.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.70 (= 68.4ms / 40.1ms)
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 140.0ms (= 14000.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.4ms (= 16136.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 161.4ms / 140.0ms)
OneFlow resnet50 time: 86.4ms (= 8641.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.9ms (= 10191.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 101.9ms / 86.4ms)
OneFlow resnet50 time: 58.0ms (= 11591.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.3ms (= 15661.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.3ms / 58.0ms)
OneFlow resnet50 time: 44.0ms (= 8792.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.6ms (= 14516.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.65 (= 72.6ms / 44.0ms)
OneFlow resnet50 time: 41.5ms (= 8305.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.5ms (= 13495.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.62 (= 67.5ms / 41.5ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9363/