ppl.llm.kernel.cuda issues

1

hello开发者，我看到OpenPPL更新的新闻： https://mp.weixin.qq.com/s/L35pj8fYakvYnL4LYu6nuw 想测试一下这个项目里flashddecoding的速度。请问怎么能够复现文章中的结果，有没有测试脚本？另外这个项目里的flashdecoding和flashattn项目里的有什么区别么？

feifeibear

[Feature Request] Any plan to support BF16 inference

2

Any plan to support BF16 inference? Our model encountered fp16 overflow after deployment.

zhouheyun

[Feature Request] 能否增加用python调用这些cuda kernal的test脚本呢？

3

你好！我想在我的llama13B和百川13B测试decode attention在解码时候的性能效果，请问有没有对应的python接口的示例呢？期待您的回复！

shiqingzhangCSU

[Feature Request] Is there any plan to provide python wrapper of the cuda kernels?

1

Hi, the kernels are awesome to support prefill-generate at the same round and it is predictable to have a better performance. However, as most inference/serving frameworks are Python-based, the cpp-only...

PannenetsF

ppl.llm.kernel.cuda
ppl.llm.kernel.cuda copied to clipboard

Metadata

Jlh dev

怎么benchmark FlashDecoding性能

[Feature Request] Any plan to support BF16 inference

[Feature Request] 能否增加用python调用这些cuda kernal的test脚本呢？

[Feature Request] Is there any plan to provide python wrapper of the cuda kernels?

← Metadata

Owner

Metadata

ppl.llm.kernel.cuda ppl.llm.kernel.cuda copied to clipboard

Metadata

Jlh dev

怎么benchmark FlashDecoding性能

[Feature Request] Any plan to support BF16 inference

[Feature Request] 能否增加用python调用这些cuda kernal的test脚本呢？

[Feature Request] Is there any plan to provide python wrapper of the cuda kernels?

← Metadata

Owner

Metadata

ppl.llm.kernel.cuda
ppl.llm.kernel.cuda copied to clipboard