ppl.llm.kernel.cuda icon indicating copy to clipboard operation
ppl.llm.kernel.cuda copied to clipboard

[Feature Request] Is there any plan to provide python wrapper of the cuda kernels?

Open PannenetsF opened this issue 1 year ago • 1 comments

Hi, the kernels are awesome to support prefill-generate at the same round and it is predictable to have a better performance.

However, as most inference/serving frameworks are Python-based, the cpp-only architecture prevents the project from further application. So is there any plan to wrap it with pybind11 so that the kernel can be used in PyTorch?

PannenetsF avatar Sep 05 '23 12:09 PannenetsF