oneflow
oneflow copied to clipboard
Move tensor api to cpython part4
此PR完成了:
- 搬运了 python 中的 tensor api 到 Python C api
- 确定了用 CPython api 重写 tensor api 的规则,即“只手写 Python 逻辑,不手写参数解析逻辑“。按照这个规则,删掉了原来比较复杂的手写的 tensor api ,将其挪到了对应的宏中
一些零碎的改动:
- std 和 var 两个函数,修改了 functional_api.yaml 中的默认参数,和 torch 对齐
- 删除了 flip_op.py 文件,把 flip_op.py 中判断 dim 的逻辑挪到了 functor 中
- 更改了 type_as 相关测试代码的变量,把 tgt_xxx 改成了 dst_xxx ,使其更规范
- 将 tensor_functions.cpp 的 PyTensorObject_extra_methods 中的函数按照类别重新排序(比如
DIRECT_PASS_FUNC
的放在一起,UNARY_METHOD
的放在一起等) - 改动了返回 True / False 中的错误写法,应该使用 PY_RETURN_TRUE/FALSE,而不是返回 Py_True / Py_False ,这样就不会造成计数引用问题而导致内存错误
- 增加宏
ARGS_ONLY_METHODS
,用来对 reshape、permute 之类除了 tensor 只接受一个 int list 参数的函数进行解析——在 torch 中,这些方法绑定到 tensor 上时,支持多种参数输入- x.reshape(1, 2) 这样传入可变的 positional arguments
- x.reshape(shape=(1, 2)) 这样传入 keyword argument
- 但是不支持 torch.reshape(x, 1, 2) 这样
- 这个宏实现了类似如下的参数解析逻辑
def _permute(self, *dims):
if len(dims) == 1:
new_dims = dims[0]
if isinstance(new_dims, int):
new_dims = (new_dims,)
else:
new_dims = dims
return flow._C.permute(self, new_dims)
搬运前 tensor api 的定义可参考:https://github.com/Oneflow-Inc/oneflow/blob/dde79e04b01521e65403d1d49fcf1154a6f289fb/python/oneflow/framework/tensor.py
改动的函数列表:
- tensor.T 绑定到 property
- cast, diag, diagonal, matmul, var, std, softplus, split 删除手写参数解析的代码,挪到
DIRECT_PASS_FUNC
- 其中 cast, diag, diagonal, matmul, split 是原来手写了参数解析,这里删掉了,挪到
DIRECT_PASS_FUNC
中 - var, std 是因为 tensor.var/std 的参数和 functor 中不同,所以原来手写了参数解析,这里和 torch 对齐,挪到
DIRECT_PASS_FUNC
中,背景见 https://github.com/Oneflow-Inc/oneflow/pull/8342#discussion_r887653560 - 其中 softplus 手写参数解析的原因是,OneFlow 原来的 tensor.softplus 中没有 beta 和 threshold 参数,但是 functor 中有,所以搬运后手写时声明了这两个参数,参考 https://github.com/Oneflow-Inc/oneflow/pull/8342/files#r887654844 。现在改成和 torch 对齐,可以直接挪到
DIRECT_PASS_FUNC
中
- 其中 cast, diag, diagonal, matmul, split 是原来手写了参数解析,这里删掉了,挪到
- eq 手写了 Python 部分的逻辑,即如果另一个 Tensor 是 None 的话,直接返回 False,其他时候调用 LogicalEqual
- type_as 删除了 CPython 的逻辑,写在 functor 里面
- 其余都是直接搬运到
DIRECT_PASS_FUNC
中
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/
CI failed when running job: cpu-module. PR label automerge has been removed
CI failed when running job: cuda-module. PR label automerge has been removed
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Static analysis with clang failed. PR label automerge has been removed
Speed stats:
GPU Name: NVIDIA GeForce GTX 1080
❌ OneFlow resnet50 time: 130.0ms (= 12999.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.3ms (= 14334.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 143.3ms / 130.0ms)
OneFlow resnet50 time: 76.1ms (= 7614.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.1ms (= 8706.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 87.1ms / 76.1ms)
OneFlow resnet50 time: 51.4ms (= 10280.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.5ms (= 11108.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.08 (= 55.5ms / 51.4ms)
OneFlow resnet50 time: 41.4ms (= 8285.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.7ms (= 9540.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.15 (= 47.7ms / 41.4ms)
OneFlow resnet50 time: 37.2ms (= 7441.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 34.8ms (= 6966.4ms / 200, input_shape=[1, 3, 224, 224])
❌ Relative speed: 0.94 (= 34.8ms / 37.2ms)
OneFlow swin dataloader time: 0.383s (= 76.664s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 29.972s / 200, num_workers=1)
Relative speed: 0.391 (= 0.150s / 0.383s)
OneFlow swin dataloader time: 0.066s (= 13.217s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.531s / 200, num_workers=4)
Relative speed: 0.645 (= 0.043s / 0.066s)
OneFlow swin dataloader time: 0.036s (= 7.246s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.555s / 200, num_workers=8)
Relative speed: 0.629 (= 0.023s / 0.036s)
❌ OneFlow resnet50 time: 146.1ms (= 14612.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 168.3ms (= 16831.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 168.3ms / 146.1ms)
OneFlow resnet50 time: 94.9ms (= 9486.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.4ms (= 11240.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 112.4ms / 94.9ms)
OneFlow resnet50 time: 71.5ms (= 14309.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 90.4ms (= 18073.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 90.4ms / 71.5ms)
OneFlow resnet50 time: 59.7ms (= 11945.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.2ms (= 15446.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.29 (= 77.2ms / 59.7ms)
OneFlow resnet50 time: 55.2ms (= 11030.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.6ms (= 13729.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 68.6ms / 55.2ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
CI failed when running job: Build cpu. PR label automerge has been removed
Static analysis with clang failed. PR label automerge has been removed
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 140.1ms (= 14007.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.1ms (= 16307.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 163.1ms / 140.1ms)
OneFlow resnet50 time: 85.9ms (= 8590.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.4ms (= 10337.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 103.4ms / 85.9ms)
OneFlow resnet50 time: 59.2ms (= 11835.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.5ms (= 15708.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 78.5ms / 59.2ms)
OneFlow resnet50 time: 45.0ms (= 8994.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.4ms (= 14079.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.57 (= 70.4ms / 45.0ms)
OneFlow resnet50 time: 41.1ms (= 8210.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.6ms (= 13719.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 68.6ms / 41.1ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8395/
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.9ms (= 13985.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.6ms (= 16257.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.6ms / 139.9ms)
OneFlow resnet50 time: 86.1ms (= 8614.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 107.3ms (= 10734.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 107.3ms / 86.1ms)
OneFlow resnet50 time: 58.3ms (= 11663.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.4ms (= 15871.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 79.4ms / 58.3ms)
OneFlow resnet50 time: 45.4ms (= 9076.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.2ms (= 14244.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.57 (= 71.2ms / 45.4ms)
OneFlow resnet50 time: 40.5ms (= 8099.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.2ms (= 13437.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 67.2ms / 40.5ms)
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 140.5ms (= 14046.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 158.4ms (= 15837.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 158.4ms / 140.5ms)
OneFlow resnet50 time: 87.8ms (= 8782.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.5ms (= 10353.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 103.5ms / 87.8ms)
OneFlow resnet50 time: 59.2ms (= 11838.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.3ms (= 15853.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 79.3ms / 59.2ms)
OneFlow resnet50 time: 45.7ms (= 9140.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.8ms (= 13962.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 69.8ms / 45.7ms)
OneFlow resnet50 time: 39.8ms (= 7960.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.6ms (= 13521.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.70 (= 67.6ms / 39.8ms)