Optimize the cache fetch for forward split, pt. 1
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.
We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.
Differential Revision: D48937380
Deploy Preview for pytorch-fbgemm-docs canceled.
| Name | Link |
|---|---|
| Latest commit | 804755212c13fb98fcdd473fe7fc7096cb22f877 |
| Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6513bb4f84737d0007e6e671 |
This pull request was exported from Phabricator. Differential Revision: D48937380
This pull request was exported from Phabricator. Differential Revision: D48937380
This pull request was exported from Phabricator. Differential Revision: D48937380
This pull request was exported from Phabricator. Differential Revision: D48937380
This pull request was exported from Phabricator. Differential Revision: D48937380
This pull request was exported from Phabricator. Differential Revision: D48937380
This pull request was exported from Phabricator. Differential Revision: D48937380
This pull request was exported from Phabricator. Differential Revision: D48937380