Lzhang-hub comments

Results 35 comments of


                                            Lzhang-hub

megatron-lm flash-ckpt can not save ckpt to disk when use pipeline parallel

> I add `--use-distributed-optimizer`,get a new error. Env: 4*8=32 a100 gpu, tp2 pp8 > > ``` > [2024-06-03 08:11:01,842] [INFO] [ckpt_saver.py:892:commit_checkpoint] The number of ready shards is 26 != 32....

issue: Think tags not detected if opening tag is in prompt template?

It it because first `` is in chat template, so model fisrt output token is not ``, open-webui can not got it. https://huggingface.co/Qwen/QwQ-32B/blob/main/tokenizer_config.json

Flash attention support softcap.

> @Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? Also, please sign your...

[Bug] get jammed when deploy Qwen2-72b ：UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

fix: apply cache size limit for VisionAttention

@mickqian I use the latest version run qwen2.5-vl-7b model with command ``` python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --host 0.0.0.0 --port 8080 --chat-template qwen2-vl --chunked-prefill-size -1 --disable-radix-cache --mm-attention-backend fa3 --attention-backend fa3...

How to train eagle3 with the new loss?

[FA3] Performance confusion

@endurehero ![image](https://github.com/user-attachments/assets/70b59279-9bf6-464a-b8cf-475ba3fd07d7) which tools are you used for get the time cost, thank you.

Development Roadmap (2025 H2)

Greate job! If I want to participate in VLM, what can I do?

support qwen3_vl vision model dp

> Is it tested for MoE? Add acc and perf bench for `Qwen3-VL-235B-A22B-Instruct`

support qwen3_vl vision model dp

> Please add a unit test for this model dp. Done