jiapingW
jiapingW
> [@jiapingW](https://github.com/jiapingW) Let me take a look into it. You can follow the code here. https://github.com/sgl-project/sglang/pull/10517
> Hi, thanks for your great work on SGLang and SpecForge! > > I am trying to test https://huggingface.co/Rayzl/qwen2.5-vl-7b-eagle3-sgl on Qwen2.5-VL using the reference configs from: [#102](https://github.com/sgl-project/SpecForge/pull/102) , but the...
I test use sglang==0.5.4. The result is below which is OK. ```python Created temporary image directory: .cache/mmstar_specforge Loaded 100 questions. 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:44
Sglang is designing and implementing spec v2, which will handle this issue.
> > Any progress on this feature? > > In progress. Expected to be completed around next Wednesday. Will first finish the SDPA version. The official nightly build of FlexAttention...
Great, I'll test its memory optimization effect in the next couple of days.
I test use the following command on 4 X H20GPUs. ``` # CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun \ # --nproc_per_node 4 \ # --standalone \ # scripts/train_eagle3.py \ # --target-model-path Qwen/Qwen2.5-7B \ #...
Great. I use `--sp_ulysses_size 2 --sp_ring_size 4 ` that will use 58G VRAM per GPU which use less VRAM than origin sdpa.