Jiarui Fang(方佳瑞)
Jiarui Fang(方佳瑞)
@Gy-Lu @ver217 @binmakeswell can you update the PP doc?
> 你好👋我目前在做长文本的相关研究,想要咨询下yunchang有没有可以适配Megatron-LM的代码? https://github.com/FlagOpen/FlagScale/commit/f98ee1e293bd906cc77f512f7a884b2030c10a12 很多人已经把USP弄到megatron-LM里了
感谢 @neonhuang !您能交一个 MR 么?如果 torch 版本<2.3 执行你粘贴的代码?
The parallel group for USP and FDSP should be the same. You can wrap the USP applied module with FSDP.
How did you use the vllm async rollout? Could you post a test script?
Hi dose this https://github.com/feifeibear/long-context-attention/pull/150 PR solve the problem.
I pull the latest main branch with the triton backend ``` python -m sglang.launch_server --model-path /demo-huabei2/common-models/DeepSeek-R1-Distill-Qwen-7B --disable-radix-cache --host 127.0.0.1 --port 1235 --tensor-parallel-size 1 --speculative-algo EAGLE --speculative-draft /demo-huabei2/common-models/EAGLE/EAGLE-Qwen2-7B-Instruct --speculative-num-steps 5 --speculative-eagle-topk...
Qwen 2.5 and Qwen 2 are the same structure.
Thank you for your insightful analysis. Indeed, we have previously encountered similar memory leak issues, and this time I will attempt to improve functions like `torch.empty`. As a temporary workaround,...
> Thank you for your thoughtful response. Setting `use_sync=True` does indeed temporarily address the memory leak issue; however, it introduces additional latency due to increased synchronization overhead. We hope to...