FBGEMM
FBGEMM copied to clipboard
reduced grid size for insert kernel (training)
Summary:
lru_cache_insert kernel is not sensitive to #SMs (latency bound). Using less SMs avoids structural hazard on the main training stream. https://docs.google.com/document/d/1p3Id8HfVMfyFn4ZcL4e79Rl0ktTSevnW3jXm9PTy0ys/edit#bookmark=id.lyjw9rtmebv0
Given the performance optimized config is with pipelining, this diff changes the number of SMs (through limiting grid size) regardless of pipelined schema.
Reviewed By: jspark1105, q10
Differential Revision: D47781958
Deploy Preview for pytorch-fbgemm-docs canceled.
| Name | Link |
|---|---|
| Latest commit | f8948122edd86ed800d508eeef036f650c2ae5d1 |
| Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/64cd60d14cc4d70008182fd5 |
This pull request was exported from Phabricator. Differential Revision: D47781958