oneflow
oneflow copied to clipboard
Support broadcast ops
- [x] broadcast_shapes
- [x] broadcast_tensors
- [x] broadcast_to
- [x] Tensor.broadcast_to
- [x] 补充文档
还需要一个异常测试以及global测试
Static analysis with clang failed. PR label automerge has been removed
CI failed when running job: cuda-module. PR label automerge has been removed
Speed stats:
Speed stats:
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 140.9ms (= 14089.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 167.9ms (= 16792.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 167.9ms / 140.9ms)
OneFlow resnet50 time: 85.6ms (= 8559.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.6ms (= 10155.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.6ms / 85.6ms)
OneFlow resnet50 time: 58.1ms (= 11613.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15542.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 77.7ms / 58.1ms)
OneFlow resnet50 time: 46.0ms (= 9192.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.8ms (= 13957.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 69.8ms / 46.0ms)
OneFlow resnet50 time: 39.3ms (= 7869.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.6ms (= 13517.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 67.6ms / 39.3ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9141/