lightllm issues

add flashinfer-trtllm-ragged-prefill-attn

1

[support] vit and llm disaggregation

Optimize multimodal resource allocation with concurrency and improved batch RPC

## summary This PR introduces a comprehensive performance overhaul of the multimodal resource allocation pipeline. It refactors both the `httpserver.manager` and the server (`CacheServer`) to replace sequential, "chatty" operations with...

dyyoungg

fix: MTP in chunked prefill mode

1

在 chunked prefill 模式下，当一个长序列被分成多个 chunck 处理时,用来来填充 draft model 的 kv cache 的 next_token_ids 可能并不正确，在 ModelInput 里面添加下一个 chunk 的首个 id 来辅助 mtp 推理。

sufubao

PD分离部署DeepSeeK-R1-FP8模型，起tp16卡的prefill服务报错

6

[Gloo] Rank 0 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 0 [Gloo] Rank 1 is connected...

wenruihua

add fa3_mtp

WANDY666

[BUG] silu_and_mul_masked UT failed

2

测试命令： pytest unit_tests/common/fused_moe/test_moe_silu_and_mul_mix_quant_ep.py 测试结果：环境信息有人和我一样出现这个错误了吗，是正常现象吗

froststeam

bug

support interns1

xhx1022

lightllm
lightllm copied to clipboard

Metadata

add flashinfer-trtllm-ragged-prefill-attn

[support] vit and llm disaggregation

[model] Support Qwen3next

Add qwen3 vl

Optimize multimodal resource allocation with concurrency and improved batch RPC

fix: MTP in chunked prefill mode

PD分离部署DeepSeeK-R1-FP8模型，起tp16卡的prefill服务报错

add fa3_mtp

[BUG] silu_and_mul_masked UT failed

support interns1

← Metadata

Owner

Metadata

lightllm lightllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

lightllm
lightllm copied to clipboard