oneflow
oneflow copied to clipboard
Add cuda amp scaler for eager
eager amp 支持 scaler ,eager 可以做完整的 amp 训练。
https://github.com/Oneflow-Inc/OneTeam/issues/1754
export ONEFLOW_VM_COMPUTE_ON_WORKER_THREAD=0之后 eager amp测试结果:
无论是训练速度还是显存占用相比于fp32模式都有较大的提升。
Speed stats:
Speed stats:
Speed stats:
Speed stats:
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
后续可以顺便支持一下cache功能
后续可以顺便支持一下cache功能
cache 有 PR 了,amp 合并就继续推进
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
CI failed when running job: Build cpu. PR label automerge has been removed
Static analysis with clang failed. PR label automerge has been removed
Speed stats:
CI failed when running job: cuda-speed-test. PR label automerge has been removed