oneflow Global mode

支持： https://github.com/Oneflow-Inc/OneTeam/issues/1792

支持创建一个 global 的 context，可以设置开关、placement、sbp，在 global context 下：

[x] 支持 GlobalTensor.device
[x] 支持 GlobalTensor.to(device)
[ ] 支持 src op 如 randn 创建时，可以直接创建出 global tensor，其 placement 和 sbp 可以从 global context 中获取
- [x] randn
- [x] empty
- [x] tensor
- [x] arange
- [ ] 其它后面的 pr 再一起补充

如此可以把 module 的 forward 中的 local 逻辑非入侵的转成 global 的，主要支持 ddp 那种数据并行的执行方式。

Nov 24 '22 11:11 strint

支持创建一个 global 的 context，可以设置开关、placement、sbp，在 global context 下：

[x] 支持 GlobalTensor.device
[x] 支持 GlobalTensor.to(device)
[ ] 支持 src op 如 randn 创建时，可以直接创建出 global tensor，其 placement 和 sbp 可以从 global context 中获取
- [x] randn

Nov 25 '22 03:11 strint

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.6ms (= 14158.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.5ms (= 16149.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 161.5ms / 141.6ms)

OneFlow resnet50 time: 85.9ms (= 8591.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10226.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 102.3ms / 85.9ms)

OneFlow resnet50 time: 58.4ms (= 11671.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.6ms (= 15511.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 77.6ms / 58.4ms)

OneFlow resnet50 time: 44.4ms (= 8881.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.4ms (= 13872.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 69.4ms / 44.4ms)

OneFlow resnet50 time: 39.3ms (= 7866.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13584.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.73 (= 67.9ms / 39.3ms)

Nov 25 '22 04:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9470/

Nov 25 '22 04:11 github-actions[bot]

目前stable diffusion遇到的需要转换的算子:

flow.randn()
flow.empty()
flow.tensor()
flow.arange()

Nov 29 '22 07:11 CPFLAME

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.5ms (= 14147.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.6ms (= 16458.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 164.6ms / 141.5ms)

OneFlow resnet50 time: 86.0ms (= 8603.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.8ms (= 10275.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 102.8ms / 86.0ms)

OneFlow resnet50 time: 58.2ms (= 11647.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.2ms (= 15645.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 78.2ms / 58.2ms)

OneFlow resnet50 time: 45.0ms (= 8998.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.0ms (= 14796.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.64 (= 74.0ms / 45.0ms)

OneFlow resnet50 time: 39.6ms (= 7916.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13592.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 68.0ms / 39.6ms)

Nov 30 '22 09:11 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9470/

Nov 30 '22 09:11 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Dec 01 '22 07:12 github-actions[bot]

支持测试的方式：在 oneflow/python/oneflow/test/graph 目录

python3 -m oneflow.distributed.launch --nproc_per_node 2 ./test_graph_with_global.py --failfast --verbose

Dec 01 '22 09:12 strint

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.3ms (= 14128.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.7ms (= 16072.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 160.7ms / 141.3ms)

OneFlow resnet50 time: 87.2ms (= 8724.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.6ms (= 10161.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 101.6ms / 87.2ms)

OneFlow resnet50 time: 58.8ms (= 11763.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.9ms (= 15386.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 76.9ms / 58.8ms)

OneFlow resnet50 time: 45.8ms (= 9164.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.7ms (= 13943.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 69.7ms / 45.8ms)

OneFlow resnet50 time: 40.6ms (= 8113.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 53.2ms (= 10643.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 53.2ms / 40.6ms)

Dec 01 '22 14:12 github-actions[bot]

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.6ms (= 14056.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.9ms (= 16386.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 163.9ms / 140.6ms)

OneFlow resnet50 time: 85.5ms (= 8545.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 111.5ms (= 11146.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 111.5ms / 85.5ms)

OneFlow resnet50 time: 58.0ms (= 11592.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.0ms (= 17593.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 88.0ms / 58.0ms)

OneFlow resnet50 time: 45.2ms (= 9034.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.4ms (= 14484.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.60 (= 72.4ms / 45.2ms)

OneFlow resnet50 time: 40.6ms (= 8124.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 86.8ms (= 17350.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 2.14 (= 86.8ms / 40.6ms)

Dec 02 '22 02:12 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Dec 02 '22 08:12 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Dec 05 '22 07:12 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Dec 05 '22 07:12 github-actions[bot]

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.8ms (= 14081.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.7ms (= 16369.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 163.7ms / 140.8ms)

OneFlow resnet50 time: 86.3ms (= 8633.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.0ms (= 10203.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 102.0ms / 86.3ms)

OneFlow resnet50 time: 57.9ms (= 11587.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.2ms (= 15433.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 77.2ms / 57.9ms)

OneFlow resnet50 time: 43.9ms (= 8775.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.6ms (= 14310.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.63 (= 71.6ms / 43.9ms)

OneFlow resnet50 time: 40.3ms (= 8050.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.5ms (= 14706.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.83 (= 73.5ms / 40.3ms)

Dec 05 '22 09:12 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9470/

Dec 05 '22 09:12 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Dec 07 '22 06:12 github-actions[bot]

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.2ms (= 14119.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.8ms (= 16484.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 164.8ms / 141.2ms)

OneFlow resnet50 time: 86.7ms (= 8668.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.1ms (= 10306.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 103.1ms / 86.7ms)

OneFlow resnet50 time: 58.1ms (= 11622.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.8ms (= 15767.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.8ms / 58.1ms)

OneFlow resnet50 time: 44.1ms (= 8829.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.2ms (= 15838.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.79 (= 79.2ms / 44.1ms)

OneFlow resnet50 time: 40.4ms (= 8081.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13606.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 68.0ms / 40.4ms)

Dec 07 '22 08:12 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Dec 08 '22 07:12 github-actions[bot]

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.7ms (= 13970.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.9ms (= 16089.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.9ms / 139.7ms)

OneFlow resnet50 time: 85.1ms (= 8506.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 106.6ms (= 10660.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 106.6ms / 85.1ms)

OneFlow resnet50 time: 57.5ms (= 11491.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.5ms (= 15508.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.5ms / 57.5ms)

OneFlow resnet50 time: 46.0ms (= 9196.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.6ms (= 13114.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.43 (= 65.6ms / 46.0ms)

OneFlow resnet50 time: 39.9ms (= 7970.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.4ms (= 13678.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 68.4ms / 39.9ms)

Dec 08 '22 16:12 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9470/

Dec 08 '22 16:12 github-actions[bot]

CI failed when running job: cpu-module. PR label automerge has been removed

Dec 08 '22 16:12 github-actions[bot]

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.3ms (= 14026.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.6ms (= 16258.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.6ms / 140.3ms)

OneFlow resnet50 time: 84.8ms (= 8480.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 111.5ms (= 11150.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 111.5ms / 84.8ms)

OneFlow resnet50 time: 57.8ms (= 11559.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.9ms (= 15588.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.9ms / 57.8ms)

OneFlow resnet50 time: 43.8ms (= 8766.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.8ms (= 14553.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 72.8ms / 43.8ms)

OneFlow resnet50 time: 41.8ms (= 8352.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.9ms (= 14586.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.75 (= 72.9ms / 41.8ms)

Dec 09 '22 02:12 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9470/

Dec 09 '22 03:12 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

Dec 09 '22 04:12 github-actions[bot]