ppl.llm.kernel.cuda
ppl.llm.kernel.cuda copied to clipboard
[Feature Request] Is there any plan to provide python wrapper of the cuda kernels?
Hi, the kernels are awesome to support prefill-generate at the same round and it is predictable to have a better performance.
However, as most inference/serving frameworks are Python-based, the cpp-only architecture prevents the project from further application. So is there any plan to wrap it with pybind11 so that the kernel can be used in PyTorch?