TransformerEngine Quantitative Analysis of FP8 GEMM's Impact on LLM Convergence

Quantitative Analysis of FP8 GEMM's Impact on LLM Convergence

Open zhipeng93 opened this issue 11 months ago • 1 comments

Hi,

I've been exploring the impressive work you've done on incorporating FP8 GEMM to accelerate tensor matrix multiplication operations in TransformerEngine. The initiative is well-support by the findings in the original paper [1], where experiments indicate that models can still converge when trained with FP8 precision.

While the initial results are certainly promising, I observed that there lacks a detailed quantitative analysis regarding the potential loss in accuracy that may come from using FP8 precision. Given the training LLMs are very expensive, this absence of granular data makes it challenging to advocate for the use of FP8 in training other LLMs.

In particular, it would be helpful to see an evaluation of:

The tensor distribution of using FP8 tensor matmul before and after FP8 cast.
The scalability of FP8 benefits across various LLMs

Thanks!

[1] https://arxiv.org/pdf/2209.05433.pdf

Mar 04 '24 08:03 zhipeng93

We have some public examples of convergence listed here. If there are any specific models/sizes you'd like to see convergence data for please reach out to us.

Mar 20 '24 16:03 sbhavani

TransformerEngine TransformerEngine copied to clipboard

Quantitative Analysis of FP8 GEMM's Impact on LLM Convergence

TransformerEngine
TransformerEngine copied to clipboard