PaddleFleetX Set flag FLAGS_enable_cublas_tensor_op

Set flag FLAGS_enable_cublas_tensor_op_math True in default.

Open GhostScreaming opened this issue 3 years ago • 1 comments

When FP32 and FP16 model runs on A100 machine, it can be accelerated using TensorCore. Although NVIDIA declares that fp32 computation will be transferred to TensorCore automatically on A100, we detect that it's not the case. As a result, we set flag FLAGS_enable_cublas_tensor_op_math True in default manually.

Nov 08 '22 08:11 GhostScreaming

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Sep 28 '24 07:09 CLAassistant

PaddleFleetX PaddleFleetX copied to clipboard

Set flag FLAGS_enable_cublas_tensor_op_math True in default.

PaddleFleetX
PaddleFleetX copied to clipboard