oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Move tensor api to cpython part4

Open marigoold opened this issue 2 years ago • 37 comments

此PR完成了:

  • 搬运了 python 中的 tensor api 到 Python C api
  • 确定了用 CPython api 重写 tensor api 的规则,即“只手写 Python 逻辑,不手写参数解析逻辑“。按照这个规则,删掉了原来比较复杂的手写的 tensor api ,将其挪到了对应的宏中

一些零碎的改动:

  • std 和 var 两个函数,修改了 functional_api.yaml 中的默认参数,和 torch 对齐
  • 删除了 flip_op.py 文件,把 flip_op.py 中判断 dim 的逻辑挪到了 functor 中
  • 更改了 type_as 相关测试代码的变量,把 tgt_xxx 改成了 dst_xxx ,使其更规范
  • 将 tensor_functions.cpp 的 PyTensorObject_extra_methods 中的函数按照类别重新排序(比如 DIRECT_PASS_FUNC 的放在一起,UNARY_METHOD 的放在一起等)
  • 改动了返回 True / False 中的错误写法,应该使用 PY_RETURN_TRUE/FALSE,而不是返回 Py_True / Py_False ,这样就不会造成计数引用问题而导致内存错误
  • 增加宏 ARGS_ONLY_METHODS ,用来对 reshape、permute 之类除了 tensor 只接受一个 int list 参数的函数进行解析——在 torch 中,这些方法绑定到 tensor 上时,支持多种参数输入
    • x.reshape(1, 2) 这样传入可变的 positional arguments
    • x.reshape(shape=(1, 2)) 这样传入 keyword argument
    • 但是不支持 torch.reshape(x, 1, 2) 这样
    • 这个宏实现了类似如下的参数解析逻辑
def _permute(self, *dims):
    if len(dims) == 1:
        new_dims = dims[0]
        if isinstance(new_dims, int):
            new_dims = (new_dims,)
    else:
        new_dims = dims
    return flow._C.permute(self, new_dims)

搬运前 tensor api 的定义可参考:https://github.com/Oneflow-Inc/oneflow/blob/dde79e04b01521e65403d1d49fcf1154a6f289fb/python/oneflow/framework/tensor.py

改动的函数列表:

  • tensor.T 绑定到 property
  • cast, diag, diagonal, matmul, var, std, softplus, split 删除手写参数解析的代码,挪到 DIRECT_PASS_FUNC
    • 其中 cast, diag, diagonal, matmul, split 是原来手写了参数解析,这里删掉了,挪到DIRECT_PASS_FUNC
    • var, std 是因为 tensor.var/std 的参数和 functor 中不同,所以原来手写了参数解析,这里和 torch 对齐,挪到DIRECT_PASS_FUNC 中,背景见 https://github.com/Oneflow-Inc/oneflow/pull/8342#discussion_r887653560
    • 其中 softplus 手写参数解析的原因是,OneFlow 原来的 tensor.softplus 中没有 beta 和 threshold 参数,但是 functor 中有,所以搬运后手写时声明了这两个参数,参考 https://github.com/Oneflow-Inc/oneflow/pull/8342/files#r887654844 。现在改成和 torch 对齐,可以直接挪到 DIRECT_PASS_FUNC
  • eq 手写了 Python 部分的逻辑,即如果另一个 Tensor 是 None 的话,直接返回 False,其他时候调用 LogicalEqual
  • type_as 删除了 CPython 的逻辑,写在 functor 里面
  • 其余都是直接搬运到 DIRECT_PASS_FUNC

marigoold avatar Jun 10 '22 06:06 marigoold

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Jun 13 '22 09:06 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

github-actions[bot] avatar Jun 13 '22 10:06 github-actions[bot]

CI failed when running job: cpu-module. PR label automerge has been removed

github-actions[bot] avatar Jun 13 '22 10:06 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions[bot] avatar Jun 13 '22 10:06 github-actions[bot]

Speed stats:

github-actions[bot] avatar Jun 13 '22 10:06 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Jun 14 '22 09:06 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Jun 15 '22 05:06 github-actions[bot]

Static analysis with clang failed. PR label automerge has been removed

github-actions[bot] avatar Jun 16 '22 05:06 github-actions[bot]

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 130.0ms (= 12999.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.3ms (= 14334.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 143.3ms / 130.0ms)

OneFlow resnet50 time: 76.1ms (= 7614.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.1ms (= 8706.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 87.1ms / 76.1ms)

OneFlow resnet50 time: 51.4ms (= 10280.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.5ms (= 11108.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.08 (= 55.5ms / 51.4ms)

OneFlow resnet50 time: 41.4ms (= 8285.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.7ms (= 9540.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.15 (= 47.7ms / 41.4ms)

OneFlow resnet50 time: 37.2ms (= 7441.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 34.8ms (= 6966.4ms / 200, input_shape=[1, 3, 224, 224])
❌ Relative speed: 0.94 (= 34.8ms / 37.2ms)

OneFlow swin dataloader time: 0.383s (= 76.664s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 29.972s / 200, num_workers=1)
Relative speed: 0.391 (= 0.150s / 0.383s)

OneFlow swin dataloader time: 0.066s (= 13.217s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.531s / 200, num_workers=4)
Relative speed: 0.645 (= 0.043s / 0.066s)

OneFlow swin dataloader time: 0.036s (= 7.246s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.555s / 200, num_workers=8)
Relative speed: 0.629 (= 0.023s / 0.036s)

❌ OneFlow resnet50 time: 146.1ms (= 14612.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 168.3ms (= 16831.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 168.3ms / 146.1ms)

OneFlow resnet50 time: 94.9ms (= 9486.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.4ms (= 11240.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 112.4ms / 94.9ms)

OneFlow resnet50 time: 71.5ms (= 14309.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 90.4ms (= 18073.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 90.4ms / 71.5ms)

OneFlow resnet50 time: 59.7ms (= 11945.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.2ms (= 15446.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.29 (= 77.2ms / 59.7ms)

OneFlow resnet50 time: 55.2ms (= 11030.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.6ms (= 13729.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 68.6ms / 55.2ms)

github-actions[bot] avatar Jun 16 '22 05:06 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

github-actions[bot] avatar Jun 16 '22 07:06 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

github-actions[bot] avatar Jun 17 '22 03:06 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Jul 29 '22 06:07 github-actions[bot]

CI failed when running job: Build cpu. PR label automerge has been removed

github-actions[bot] avatar Oct 08 '22 02:10 github-actions[bot]

Static analysis with clang failed. PR label automerge has been removed

github-actions[bot] avatar Oct 08 '22 02:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Oct 08 '22 06:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Oct 08 '22 07:10 github-actions[bot]

Speed stats:

github-actions[bot] avatar Oct 08 '22 11:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Oct 08 '22 13:10 github-actions[bot]

Speed stats:

github-actions[bot] avatar Oct 08 '22 14:10 github-actions[bot]

Speed stats:

github-actions[bot] avatar Oct 09 '22 02:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Oct 09 '22 03:10 github-actions[bot]

Speed stats:

github-actions[bot] avatar Oct 09 '22 03:10 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

github-actions[bot] avatar Oct 09 '22 04:10 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.1ms (= 14007.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.1ms (= 16307.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 163.1ms / 140.1ms)

OneFlow resnet50 time: 85.9ms (= 8590.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.4ms (= 10337.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 103.4ms / 85.9ms)

OneFlow resnet50 time: 59.2ms (= 11835.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.5ms (= 15708.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 78.5ms / 59.2ms)

OneFlow resnet50 time: 45.0ms (= 8994.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.4ms (= 14079.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.57 (= 70.4ms / 45.0ms)

OneFlow resnet50 time: 41.1ms (= 8210.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.6ms (= 13719.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 68.6ms / 41.1ms)

github-actions[bot] avatar Oct 09 '22 04:10 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

github-actions[bot] avatar Oct 09 '22 06:10 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.9ms (= 13985.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.6ms (= 16257.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.6ms / 139.9ms)

OneFlow resnet50 time: 86.1ms (= 8614.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 107.3ms (= 10734.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 107.3ms / 86.1ms)

OneFlow resnet50 time: 58.3ms (= 11663.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.4ms (= 15871.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 79.4ms / 58.3ms)

OneFlow resnet50 time: 45.4ms (= 9076.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.2ms (= 14244.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.57 (= 71.2ms / 45.4ms)

OneFlow resnet50 time: 40.5ms (= 8099.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.2ms (= 13437.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 67.2ms / 40.5ms)

github-actions[bot] avatar Oct 09 '22 06:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Oct 19 '22 07:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Oct 19 '22 07:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Oct 19 '22 07:10 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.5ms (= 14046.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 158.4ms (= 15837.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 158.4ms / 140.5ms)

OneFlow resnet50 time: 87.8ms (= 8782.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.5ms (= 10353.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 103.5ms / 87.8ms)

OneFlow resnet50 time: 59.2ms (= 11838.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.3ms (= 15853.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 79.3ms / 59.2ms)

OneFlow resnet50 time: 45.7ms (= 9140.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.8ms (= 13962.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 69.8ms / 45.7ms)

OneFlow resnet50 time: 39.8ms (= 7960.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.6ms (= 13521.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.70 (= 67.6ms / 39.8ms)

github-actions[bot] avatar Oct 25 '22 03:10 github-actions[bot]