FBGEMM int4-TBE auto-vec optimization by increasing prefetching

Summary: Increase prefetching and reduce backend stall as is suggested by NVIDIA

Differential Revision: D53552699

Feb 08 '24 23:02 excelle08

Name	Link
Latest commit	bab6fd5d850fc1bb43ba1102caafc7c2c48bff23
Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/65d6532624e3230008dd91a8
Deploy Preview	https://deploy-preview-2325--pytorch-fbgemm-docs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Feb 08 '24 23:02 netlify[bot]

This pull request was exported from Phabricator. Differential Revision: D53552699

Feb 08 '24 23:02 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D53552699

Feb 13 '24 21:02 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D53552699

Feb 21 '24 19:02 facebook-github-bot

This pull request has been merged in pytorch/FBGEMM@f6195314346935c7b10909a1d5e70a97357ffd3c.

Apr 18 '24 16:04 facebook-github-bot

int4-TBE auto-vec optimization by increasing prefetching