Guo Yejun
Guo Yejun
with this PR, if the recompute_fwd is true, the output changes from bwd FLOPS per GPU = **2** * fwd flops per GPU / bwd latency: ... TFLOPS fwd+bwd FLOPS...
just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in the cases...
any comment? thanks.
@jeffra @tjruwase @cli99 any comment? thanks.
> just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in the...
for bf16, the gradient scale is not needed.
> These messages mess up PuDB's UI on my telnet session. I saw the same issue, it is not convenient.
> https://stackoverflow.com/questions/52974938/redirect-django-runserver-output-in-order-to-debug-with-pudb @vladp did you verify that it works? I just tried and it did not work, the messages are still in the telnet terminal.
hi, I don't think the ci failure (copy below) is caused by this PR. Run cmake -GNinja -B./build-e2e -S./llvm/sycl/test-e2e -DSYCL_TEST_E2E_TARGETS="ext_oneapi_hip:gpu" -DCMAKE_CXX_COMPILER="$(which clang++)" -DLLVM_LIT="$PWD/llvm/llvm/utils/lit/lit.py" -DHIP_PLATFORM="AMD" -DAMD_ARCH="gfx1031" Configuring SYCL End-to-End Tests --...
ping for merge