ppl.llm.kernel.cuda icon indicating copy to clipboard operation
ppl.llm.kernel.cuda copied to clipboard

Results 5 ppl.llm.kernel.cuda issues
Sort by recently updated
recently updated
newest added

hello开发者, 我看到OpenPPL更新的新闻: https://mp.weixin.qq.com/s/L35pj8fYakvYnL4LYu6nuw 想测试一下这个项目里flashddecoding的速度。请问怎么能够复现文章中的结果,有没有测试脚本? 另外这个项目里的flashdecoding和flashattn项目里的有什么区别么?

Any plan to support BF16 inference? Our model encountered fp16 overflow after deployment.

你好!我想在我的llama13B和百川13B测试decode attention在解码时候的性能效果,请问有没有对应的python接口的示例呢? 期待您的回复!

Hi, the kernels are awesome to support prefill-generate at the same round and it is predictable to have a better performance. However, as most inference/serving frameworks are Python-based, the cpp-only...