Implement custom kernels for top-k and top-p sampling

Open WoosukKwon opened this issue 2 years ago • 0 comments

As mentioned in https://github.com/WoosukKwon/cacheflow/pull/81#issuecomment-1546980281, the current PyTorch-based top-k and top-p implementation is memory-inefficient. This can be improved by introducing custom kernels.

May 25 '23 01:05 WoosukKwon