oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Add clamp_min/max and inplace version functor

Open marigoold opened this issue 3 years ago • 8 comments

背景:https://github.com/Oneflow-Inc/OneTeam/issues/1600 此PR完成了:

  • 增加了 clamp_min , clamp_max , clamp_min_ , clamp_max_ 接口并增加了相应的文档和单元测试

实现方式:全部都在 functor 层调用 ClampBaseFunctor 实现

文档截图: image image

此外,发现 clamp 系列存在两个问题,需要之后修复:

  • 不支持 out 参数
  • min / max 不支持 tensor 输入

marigoold avatar Aug 09 '22 03:08 marigoold

Static analysis with clang failed. PR label automerge has been removed

github-actions[bot] avatar Aug 09 '22 07:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8882/

github-actions[bot] avatar Aug 09 '22 15:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.2ms (= 12818.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 139.7ms (= 13972.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 139.7ms / 128.2ms)

OneFlow resnet50 time: 75.6ms (= 7558.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.1ms (= 8409.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.11 (= 84.1ms / 75.6ms)

OneFlow resnet50 time: 48.3ms (= 9655.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.1ms (= 11821.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.22 (= 59.1ms / 48.3ms)

OneFlow resnet50 time: 35.7ms (= 7150.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.8ms (= 9553.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.34 (= 47.8ms / 35.7ms)

OneFlow resnet50 time: 28.1ms (= 5624.1ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.7ms (= 7331.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.30 (= 36.7ms / 28.1ms)

OneFlow swin dataloader time: 0.272s (= 54.489s / 200, num_workers=1)
PyTorch swin dataloader time: 0.148s (= 29.672s / 200, num_workers=1)
Relative speed: 0.545 (= 0.148s / 0.272s)

OneFlow swin dataloader time: 0.075s (= 15.014s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.204s / 200, num_workers=4)
Relative speed: 0.546 (= 0.041s / 0.075s)

OneFlow swin dataloader time: 0.060s (= 12.028s / 200, num_workers=8)
PyTorch swin dataloader time: 0.021s (= 4.264s / 200, num_workers=8)
Relative speed: 0.355 (= 0.021s / 0.060s)

❌ OneFlow resnet50 time: 136.5ms (= 13654.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.4ms (= 16143.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.4ms / 136.5ms)

OneFlow resnet50 time: 84.3ms (= 8427.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.0ms (= 10197.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.0ms / 84.3ms)

OneFlow resnet50 time: 57.7ms (= 11535.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15547.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.7ms / 57.7ms)

OneFlow resnet50 time: 45.2ms (= 9035.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.2ms (= 14232.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 71.2ms / 45.2ms)

OneFlow resnet50 time: 38.9ms (= 7784.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.5ms (= 13902.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.79 (= 69.5ms / 38.9ms)

github-actions[bot] avatar Aug 09 '22 15:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.2ms (= 12822.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.9ms (= 14292.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.9ms / 128.2ms)

OneFlow resnet50 time: 75.3ms (= 7528.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.4ms (= 8444.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.12 (= 84.4ms / 75.3ms)

OneFlow resnet50 time: 48.3ms (= 9654.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 56.1ms (= 11223.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.16 (= 56.1ms / 48.3ms)

OneFlow resnet50 time: 35.8ms (= 7164.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 39.7ms (= 7945.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.11 (= 39.7ms / 35.8ms)

OneFlow resnet50 time: 28.1ms (= 5627.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.6ms (= 8328.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.48 (= 41.6ms / 28.1ms)

OneFlow swin dataloader time: 0.262s (= 52.338s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.124s / 200, num_workers=1)
Relative speed: 0.576 (= 0.151s / 0.262s)

OneFlow swin dataloader time: 0.069s (= 13.792s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.604s / 200, num_workers=4)
Relative speed: 0.624 (= 0.043s / 0.069s)

OneFlow swin dataloader time: 0.040s (= 7.924s / 200, num_workers=8)
PyTorch swin dataloader time: 0.024s (= 4.787s / 200, num_workers=8)
Relative speed: 0.604 (= 0.024s / 0.040s)

❌ OneFlow resnet50 time: 136.5ms (= 13654.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.8ms (= 16178.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.8ms / 136.5ms)

OneFlow resnet50 time: 84.3ms (= 8425.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.8ms (= 10184.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 101.8ms / 84.3ms)

OneFlow resnet50 time: 57.6ms (= 11510.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.8ms (= 15558.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.8ms / 57.6ms)

OneFlow resnet50 time: 45.2ms (= 9048.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.8ms (= 14354.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 71.8ms / 45.2ms)

OneFlow resnet50 time: 39.0ms (= 7797.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.1ms (= 13417.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 67.1ms / 39.0ms)

github-actions[bot] avatar Aug 11 '22 02:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8882/

github-actions[bot] avatar Aug 11 '22 02:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8882/

github-actions[bot] avatar Aug 11 '22 05:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.4ms (= 12837.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 144.2ms (= 14417.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 144.2ms / 128.4ms)

OneFlow resnet50 time: 75.3ms (= 7525.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.6ms (= 8655.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.15 (= 86.6ms / 75.3ms)

OneFlow resnet50 time: 48.4ms (= 9674.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.7ms (= 12540.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.30 (= 62.7ms / 48.4ms)

OneFlow resnet50 time: 36.0ms (= 7205.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 50.0ms (= 9992.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.39 (= 50.0ms / 36.0ms)

OneFlow resnet50 time: 28.1ms (= 5628.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.3ms (= 7463.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.33 (= 37.3ms / 28.1ms)

OneFlow swin dataloader time: 0.405s (= 81.088s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.082s / 200, num_workers=1)
Relative speed: 0.371 (= 0.150s / 0.405s)

OneFlow swin dataloader time: 0.073s (= 14.554s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.262s / 200, num_workers=4)
Relative speed: 0.568 (= 0.041s / 0.073s)

OneFlow swin dataloader time: 0.039s (= 7.872s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.574s / 200, num_workers=8)
Relative speed: 0.581 (= 0.023s / 0.039s)

❌ OneFlow resnet50 time: 136.4ms (= 13643.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.7ms (= 16066.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 160.7ms / 136.4ms)

OneFlow resnet50 time: 84.6ms (= 8457.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 107.8ms (= 10778.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 107.8ms / 84.6ms)

OneFlow resnet50 time: 57.8ms (= 11555.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.1ms (= 15621.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.1ms / 57.8ms)

OneFlow resnet50 time: 45.5ms (= 9092.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.7ms (= 14139.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 70.7ms / 45.5ms)

OneFlow resnet50 time: 38.6ms (= 7713.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.2ms (= 13433.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 67.2ms / 38.6ms)

github-actions[bot] avatar Aug 11 '22 05:08 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions[bot] avatar Aug 11 '22 06:08 github-actions[bot]

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8882/

github-actions[bot] avatar Aug 11 '22 15:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.4ms (= 12838.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.8ms (= 14181.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.8ms / 128.4ms)

OneFlow resnet50 time: 75.3ms (= 7528.4ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.3ms (= 8432.3ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.12 (= 84.3ms / 75.3ms)

OneFlow resnet50 time: 48.6ms (= 9714.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.8ms (= 11561.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.19 (= 57.8ms / 48.6ms)

OneFlow resnet50 time: 36.1ms (= 7226.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.7ms (= 8338.9ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.15 (= 41.7ms / 36.1ms)

OneFlow resnet50 time: 28.4ms (= 5679.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.2ms (= 7633.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.34 (= 38.2ms / 28.4ms)

OneFlow swin dataloader time: 0.269s (= 53.796s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.106s / 200, num_workers=1)
Relative speed: 0.560 (= 0.151s / 0.269s)

OneFlow swin dataloader time: 0.071s (= 14.188s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.158s / 200, num_workers=4)
Relative speed: 0.575 (= 0.041s / 0.071s)

OneFlow swin dataloader time: 0.041s (= 8.235s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.675s / 200, num_workers=8)
Relative speed: 0.568 (= 0.023s / 0.041s)

❌ OneFlow resnet50 time: 136.8ms (= 13677.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 167.7ms (= 16768.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 167.7ms / 136.8ms)

OneFlow resnet50 time: 85.2ms (= 8515.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.3ms (= 10334.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 103.3ms / 85.2ms)

OneFlow resnet50 time: 58.6ms (= 11716.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.8ms (= 15550.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 77.8ms / 58.6ms)

OneFlow resnet50 time: 45.3ms (= 9058.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.7ms (= 13947.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 69.7ms / 45.3ms)

OneFlow resnet50 time: 38.7ms (= 7746.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.0ms (= 13197.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.70 (= 66.0ms / 38.7ms)

github-actions[bot] avatar Aug 11 '22 15:08 github-actions[bot]

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions[bot] avatar Aug 11 '22 16:08 github-actions[bot]

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.4ms (= 12838.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.6ms (= 14355.5ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 143.6ms / 128.4ms)

OneFlow resnet50 time: 75.5ms (= 7554.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.1ms (= 8511.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 85.1ms / 75.5ms)

OneFlow resnet50 time: 48.9ms (= 9786.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 63.6ms (= 12717.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.30 (= 63.6ms / 48.9ms)

OneFlow resnet50 time: 36.4ms (= 7286.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.4ms (= 8476.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.16 (= 42.4ms / 36.4ms)

OneFlow resnet50 time: 28.4ms (= 5685.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.9ms (= 7774.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.37 (= 38.9ms / 28.4ms)

OneFlow swin dataloader time: 0.394s (= 78.831s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.085s / 200, num_workers=1)
Relative speed: 0.382 (= 0.150s / 0.394s)

OneFlow swin dataloader time: 0.069s (= 13.809s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 7.980s / 200, num_workers=4)
Relative speed: 0.578 (= 0.040s / 0.069s)

OneFlow swin dataloader time: 0.040s (= 8.083s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.482s / 200, num_workers=8)
Relative speed: 0.555 (= 0.022s / 0.040s)

❌ OneFlow resnet50 time: 136.8ms (= 13682.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.3ms (= 16030.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 160.3ms / 136.8ms)

OneFlow resnet50 time: 84.6ms (= 8459.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 108.4ms (= 10842.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 108.4ms / 84.6ms)

OneFlow resnet50 time: 58.0ms (= 11596.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.8ms (= 15564.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 77.8ms / 58.0ms)

OneFlow resnet50 time: 45.2ms (= 9047.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.5ms (= 13898.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 69.5ms / 45.2ms)

OneFlow resnet50 time: 38.8ms (= 7766.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.2ms (= 14834.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.91 (= 74.2ms / 38.8ms)

github-actions[bot] avatar Aug 12 '22 03:08 github-actions[bot]