Do FP8 rowwise bias addition in higher precision
Summary: Previously, when bias was used in our FP8 rowwise kernel, it was added to the accumulator in its native precision. For example, if the bias is bf16, we would do a bf16 + bf16 addition. However, it's a bit more efficient and a bit more accurate to leave the accumulator in fp32, cast the bias to fp32, then to an fp32 addition.
Differential Revision: D74408348
Deploy Preview for pytorch-fbgemm-docs ready!
| Name | Link |
|---|---|
| Latest commit | c6f491c6aeefa6fee36a951d6f16903ab4595212 |
| Latest deploy log | https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/684b672f453c1c00084b2559 |
| Deploy Preview | https://deploy-preview-4095--pytorch-fbgemm-docs.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify project configuration.
This pull request was exported from Phabricator. Differential Revision: D74408348
This pull request was exported from Phabricator. Differential Revision: D74408348
This pull request was exported from Phabricator. Differential Revision: D74408348
This pull request was exported from Phabricator. Differential Revision: D74408348
This pull request was exported from Phabricator. Differential Revision: D74408348