oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

Add cuda amp scaler for eager

Open BBuf opened this issue 2 years ago • 5 comments

eager amp 支持 scaler ,eager 可以做完整的 amp 训练。

https://github.com/Oneflow-Inc/OneTeam/issues/1754

export ONEFLOW_VM_COMPUTE_ON_WORKER_THREAD=0之后 eager amp测试结果:

图片

图片

无论是训练速度还是显存占用相比于fp32模式都有较大的提升。

BBuf avatar Oct 31 '22 02:10 BBuf

Speed stats:

github-actions[bot] avatar Nov 07 '22 02:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 09 '22 06:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 10 '22 06:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 10 '22 10:11 github-actions[bot]

Speed stats:

github-actions[bot] avatar Nov 11 '22 11:11 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Apr 03 '23 12:04 github-actions[bot]

后续可以顺便支持一下cache功能

hjchen2 avatar Apr 03 '23 12:04 hjchen2

后续可以顺便支持一下cache功能

cache 有 PR 了,amp 合并就继续推进

marigoold avatar Apr 03 '23 13:04 marigoold

Speed stats:

github-actions[bot] avatar Apr 03 '23 14:04 github-actions[bot]

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions[bot] avatar Apr 09 '23 00:04 github-actions[bot]

CI failed when running job: Build cpu. PR label automerge has been removed

github-actions[bot] avatar Apr 09 '23 00:04 github-actions[bot]

Static analysis with clang failed. PR label automerge has been removed

github-actions[bot] avatar Apr 09 '23 00:04 github-actions[bot]

Speed stats:

github-actions[bot] avatar Apr 09 '23 04:04 github-actions[bot]

CI failed when running job: cuda-speed-test. PR label automerge has been removed

github-actions[bot] avatar Apr 09 '23 04:04 github-actions[bot]