lightllm
lightllm copied to clipboard
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
如题,多谢
- Add a new arg `pd_chunk_size` to decide the chunk size. 0 means no chunk - Support decode chunk
Add fake balance for EP mode, which is controled by option of --enable_ep_fake_balance. Cost: EP8 batch128 input64 (40+ different seqlens) totally cost about 5 seconds. Benefit: prefill throughput increase 35%,...
Hi! I'm Sergey from the Integrations team over at [AI/ML API](https://aimlapi.com/), a startup with 150K+ users, providing over 300 AI models in one place Your project looks dope, so we'd...
Hi LightLLM 团队, 感谢你们提供高效、轻量的推理框架。希望未来可以支持以下feature - 支持HuggingFace 上常见的 AWQ/GPTQ 静态量化模型推理(例如qwen系列的awq模型)。 - 支持你们团队 LLMC 量化模型的原生加载和推理 这两个特性对于部署是非常节省时间和友好的,再次感谢你们的工作。