lightllm
lightllm copied to clipboard
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
## summary This PR introduces a comprehensive performance overhaul of the multimodal resource allocation pipeline. It refactors both the `httpserver.manager` and the server (`CacheServer`) to replace sequential, "chatty" operations with...
在 chunked prefill 模式下,当一个长序列被分成多个 chunck 处理时,用来来填充 draft model 的 kv cache 的 next_token_ids 可能并不正确,在 ModelInput 里面添加下一个 chunk 的首个 id 来辅助 mtp 推理。
[Gloo] Rank 0 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 0 [Gloo] Rank 1 is connected...
测试命令: pytest unit_tests/common/fused_moe/test_moe_silu_and_mul_mix_quant_ep.py 测试结果: 环境信息 有人和我一样出现这个错误了吗,是正常现象吗