oneflow Move tensor api to cpython part4

此PR完成了：

搬运了 python 中的 tensor api 到 Python C api
确定了用 CPython api 重写 tensor api 的规则，即“只手写 Python 逻辑，不手写参数解析逻辑“。按照这个规则，删掉了原来比较复杂的手写的 tensor api ，将其挪到了对应的宏中

一些零碎的改动：

std 和 var 两个函数，修改了 functional_api.yaml 中的默认参数，和 torch 对齐
删除了 flip_op.py 文件，把 flip_op.py 中判断 dim 的逻辑挪到了 functor 中
更改了 type_as 相关测试代码的变量，把 tgt_xxx 改成了 dst_xxx ，使其更规范
将 tensor_functions.cpp 的 PyTensorObject_extra_methods 中的函数按照类别重新排序（比如 DIRECT_PASS_FUNC 的放在一起，UNARY_METHOD 的放在一起等）
改动了返回 True / False 中的错误写法，应该使用 PY_RETURN_TRUE/FALSE，而不是返回 Py_True / Py_False ，这样就不会造成计数引用问题而导致内存错误
增加宏 ARGS_ONLY_METHODS ，用来对 reshape、permute 之类除了 tensor 只接受一个 int list 参数的函数进行解析——在 torch 中，这些方法绑定到 tensor 上时，支持多种参数输入
- x.reshape(1, 2) 这样传入可变的 positional arguments
- x.reshape(shape=(1, 2)) 这样传入 keyword argument
- 但是不支持 torch.reshape(x, 1, 2) 这样
- 这个宏实现了类似如下的参数解析逻辑

def _permute(self, *dims):
    if len(dims) == 1:
        new_dims = dims[0]
        if isinstance(new_dims, int):
            new_dims = (new_dims,)
    else:
        new_dims = dims
    return flow._C.permute(self, new_dims)

搬运前 tensor api 的定义可参考：https://github.com/Oneflow-Inc/oneflow/blob/dde79e04b01521e65403d1d49fcf1154a6f289fb/python/oneflow/framework/tensor.py

改动的函数列表：

tensor.T 绑定到 property
cast, diag, diagonal, matmul, var, std, softplus, split 删除手写参数解析的代码，挪到 DIRECT_PASS_FUNC
- 其中 cast, diag, diagonal, matmul, split 是原来手写了参数解析，这里删掉了，挪到DIRECT_PASS_FUNC 中
- var, std 是因为 tensor.var/std 的参数和 functor 中不同，所以原来手写了参数解析，这里和 torch 对齐，挪到DIRECT_PASS_FUNC 中，背景见 https://github.com/Oneflow-Inc/oneflow/pull/8342#discussion_r887653560
- 其中 softplus 手写参数解析的原因是，OneFlow 原来的 tensor.softplus 中没有 beta 和 threshold 参数，但是 functor 中有，所以搬运后手写时声明了这两个参数，参考 https://github.com/Oneflow-Inc/oneflow/pull/8342/files#r887654844 。现在改成和 torch 对齐，可以直接挪到 DIRECT_PASS_FUNC 中
eq 手写了 Python 部分的逻辑，即如果另一个 Tensor 是 None 的话，直接返回 False，其他时候调用 LogicalEqual
type_as 删除了 CPython 的逻辑，写在 functor 里面
其余都是直接搬运到 DIRECT_PASS_FUNC 中

Jun 10 '22 06:06 marigoold

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Jun 13 '22 09:06 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

Jun 13 '22 10:06 github-actions[bot]

CI failed when running job: cpu-module. PR label automerge has been removed

Jun 13 '22 10:06 github-actions[bot]

CI failed when running job: cuda-module. PR label automerge has been removed

Jun 13 '22 10:06 github-actions[bot]

Speed stats:

Jun 13 '22 10:06 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Jun 14 '22 09:06 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Jun 15 '22 05:06 github-actions[bot]

Static analysis with clang failed. PR label automerge has been removed

Jun 16 '22 05:06 github-actions[bot]

Speed stats:

GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 130.0ms (= 12999.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.3ms (= 14334.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 143.3ms / 130.0ms)

OneFlow resnet50 time: 76.1ms (= 7614.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.1ms (= 8706.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 87.1ms / 76.1ms)

OneFlow resnet50 time: 51.4ms (= 10280.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.5ms (= 11108.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.08 (= 55.5ms / 51.4ms)

OneFlow resnet50 time: 41.4ms (= 8285.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.7ms (= 9540.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.15 (= 47.7ms / 41.4ms)

OneFlow resnet50 time: 37.2ms (= 7441.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 34.8ms (= 6966.4ms / 200, input_shape=[1, 3, 224, 224])
❌ Relative speed: 0.94 (= 34.8ms / 37.2ms)

OneFlow swin dataloader time: 0.383s (= 76.664s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 29.972s / 200, num_workers=1)
Relative speed: 0.391 (= 0.150s / 0.383s)

OneFlow swin dataloader time: 0.066s (= 13.217s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.531s / 200, num_workers=4)
Relative speed: 0.645 (= 0.043s / 0.066s)

OneFlow swin dataloader time: 0.036s (= 7.246s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.555s / 200, num_workers=8)
Relative speed: 0.629 (= 0.023s / 0.036s)

❌ OneFlow resnet50 time: 146.1ms (= 14612.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 168.3ms (= 16831.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 168.3ms / 146.1ms)

OneFlow resnet50 time: 94.9ms (= 9486.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.4ms (= 11240.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 112.4ms / 94.9ms)

OneFlow resnet50 time: 71.5ms (= 14309.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 90.4ms (= 18073.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 90.4ms / 71.5ms)

OneFlow resnet50 time: 59.7ms (= 11945.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.2ms (= 15446.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.29 (= 77.2ms / 59.7ms)

OneFlow resnet50 time: 55.2ms (= 11030.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.6ms (= 13729.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 68.6ms / 55.2ms)

Jun 16 '22 05:06 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

Jun 16 '22 07:06 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

Jun 17 '22 03:06 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Jul 29 '22 06:07 github-actions[bot]

CI failed when running job: Build cpu. PR label automerge has been removed

Oct 08 '22 02:10 github-actions[bot]

Static analysis with clang failed. PR label automerge has been removed

Oct 08 '22 02:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Oct 08 '22 06:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Oct 08 '22 07:10 github-actions[bot]

Speed stats:

Oct 08 '22 11:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Oct 08 '22 13:10 github-actions[bot]

Speed stats:

Oct 08 '22 14:10 github-actions[bot]

Speed stats:

Oct 09 '22 02:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Oct 09 '22 03:10 github-actions[bot]

Speed stats:

Oct 09 '22 03:10 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

Oct 09 '22 04:10 github-actions[bot]

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.1ms (= 14007.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.1ms (= 16307.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 163.1ms / 140.1ms)

OneFlow resnet50 time: 85.9ms (= 8590.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.4ms (= 10337.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 103.4ms / 85.9ms)

OneFlow resnet50 time: 59.2ms (= 11835.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.5ms (= 15708.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 78.5ms / 59.2ms)

OneFlow resnet50 time: 45.0ms (= 8994.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.4ms (= 14079.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.57 (= 70.4ms / 45.0ms)

OneFlow resnet50 time: 41.1ms (= 8210.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.6ms (= 13719.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 68.6ms / 41.1ms)

Oct 09 '22 04:10 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/

Oct 09 '22 06:10 github-actions[bot]

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.9ms (= 13985.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.6ms (= 16257.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.6ms / 139.9ms)

OneFlow resnet50 time: 86.1ms (= 8614.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 107.3ms (= 10734.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 107.3ms / 86.1ms)

OneFlow resnet50 time: 58.3ms (= 11663.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.4ms (= 15871.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 79.4ms / 58.3ms)

OneFlow resnet50 time: 45.4ms (= 9076.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.2ms (= 14244.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.57 (= 71.2ms / 45.4ms)

OneFlow resnet50 time: 40.5ms (= 8099.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.2ms (= 13437.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 67.2ms / 40.5ms)

Oct 09 '22 06:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Oct 19 '22 07:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Oct 19 '22 07:10 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Oct 19 '22 07:10 github-actions[bot]

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.5ms (= 14046.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 158.4ms (= 15837.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 158.4ms / 140.5ms)

OneFlow resnet50 time: 87.8ms (= 8782.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.5ms (= 10353.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 103.5ms / 87.8ms)

OneFlow resnet50 time: 59.2ms (= 11838.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.3ms (= 15853.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 79.3ms / 59.2ms)

OneFlow resnet50 time: 45.7ms (= 9140.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.8ms (= 13962.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 69.8ms / 45.7ms)

OneFlow resnet50 time: 39.8ms (= 7960.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.6ms (= 13521.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.70 (= 67.6ms / 39.8ms)

Oct 25 '22 03:10 github-actions[bot]

oneflow oneflow copied to clipboard

Move tensor api to cpython part4

oneflow
oneflow copied to clipboard