@Abigbigbig This looks like a different issue from this PR. Let's move to a different issue. I can point you the fix

Originally posted by @yubofredwang in https://github.com/sgl-project/SpecForge/issues/314#issuecomment-3588281081 Thank you for your answer，I have now reverted back to specforge 0.1.0 and sglang 0.5.4, and used your previous solution to add "kernel_options={ "BLOCK_M": 32, "BLOCK_N": 32, "BLOCK_M1": 32, "BLOCK_N1": 32, "BLOCK_M2": 32, "BLOCK_N2": 32, } attn_output = flex_attention_func( query=query_states, key=key_cache.contiguous(), value=value_cache.contiguous(), block_mask=block_mask, enable_gqa=True, kernel_options=kernel_options, ）”Solved the compilation issue of 'OutOfCacheError: out of resource: triton-tem_fused_0 Required: 107008 Hardware limit: 101376', but there was insufficient video memory during training. I am using two 48GB A6000 graphics cards. Can this model of graphics card meet the requirements. The specific question is "[rank1]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.02 GiB. GPU 1 has a total capacity of 47.40 GiB of which 30.69 GiB is free. Process 583936 has 29.95 GiB memory in use. Process 1818617 has 16.71 GiB memory in use. Of the allocated memory 16.25 GiB is allocated by PyTorch, and 204.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)".

Nov 28 '25 08:11 Abigbigbig

was this using sglang for target backend or huggingface?

Nov 28 '25 09:11 yubofredwang

was this using sglang for target backend or huggingface?

I didn't modify the specforge source code, I trained directly using “examples/run_qwen2_5-vl_eagle3_online.sh”，“#!/bin/bash

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) ROOT_DIR=$(dirname $SCRIPT_DIR)

support tp1 train eagle3 for qwen2.5-vl-7b-instruct

NUM_GPUS=${1:-1}

torchrun
--standalone
--nproc_per_node $NUM_GPUS
$ROOT_DIR/scripts/train_eagle3_online.py
--target-model-path Qwen/Qwen2.5-VL-7B-Instruct
--draft-model-config $ROOT_DIR/configs/qwen2-5-vl-eagle3.json
--train-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl
--output-dir $ROOT_DIR/outputs/Qwen2.5-VL-7B-eagle3
--num-epochs 10
--batch-size 1
--learning-rate 1e-4
--max-length 8192
--dist-timeout 360
--chat-template qwen2-vl
--cache-dir $ROOT_DIR/cache
--embedding-key model.embed_tokens.weight
--tp-size 1
--is-vlm
--min-pixels 50176
--max-pixels 802816”

Nov 28 '25 09:11 Abigbigbig

was this using sglang for target backend or huggingface?

I didn't modify the specforge source code, I trained directly using “examples/run_qwen2_5-vl_eagle3_online.sh”，“#!/bin/bash

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) ROOT_DIR=$(dirname $SCRIPT_DIR)

support tp1 train eagle3 for qwen2.5-vl-7b-instruct

NUM_GPUS=${1:-1}

torchrun --standalone --nproc_per_node $NUM_GPUS $ROOT_DIR/scripts/train_eagle3_online.py --target-model-path Qwen/Qwen2.5-VL-7B-Instruct --draft-model-config $ROOT_DIR/configs/qwen2-5-vl-eagle3.json --train-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl --output-dir $ROOT_DIR/outputs/Qwen2.5-VL-7B-eagle3 --num-epochs 10 --batch-size 1 --learning-rate 1e-4 --max-length 8192 --dist-timeout 360 --chat-template qwen2-vl --cache-dir $ROOT_DIR/cache --embedding-key model.embed_tokens.weight --tp-size 1 --is-vlm --min-pixels 50176 --max-pixels 802816”

I tried to set tp_Size==2 and run the training using "bash examples/run_qwen2_5-vl_eagle3_online.sh 2", so that "sglang for target backend" could be used, but it encountered a conflict. Specific error was, "WARNING:sglang.srt.server_args:Cuda graph is disabled because of using torch native attention backend torch_dtype is deprecated! Use dtype instead! WARNING:sglang.srt.server_args:Cuda graph is disabled because of using torch native attention backend [Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1 [Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1 [Gloo] Rank [Gloo] Rank 01 is connected to is connected to 11 peer ranks. peer ranks. Expected number of connected peer ranks is : Expected number of connected peer ranks is : 11

[rank0]: Traceback (most recent call last): [rank0]: File "/mnt/data/wangrenjun/SpecForge/scripts/train_eagle3_online.py", line 748, in [rank0]: main() [rank0]: File "/mnt/data/wangrenjun/SpecForge/scripts/train_eagle3_online.py", line 573, in main [rank0]: target_model, processor = build_target_model(args, draft_model_config) ...... [rank1]: File "/opt/conda/envs/venv_sglang/lib/python3.11/site-packages/sglang/srt/distributed/device_communicators/pynccl.py", line 112, in init [rank1]: self.comm: ncclComm_t = self.nccl.ncclCommInitRank( [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/opt/conda/envs/venv_sglang/lib/python3.11/site-packages/sglang/srt/distributed/device_communicators/pynccl_wrapper.py", line 399, in ncclCommInitRank [rank1]: self.NCCL_CHECK( [rank1]: File "/opt/conda/envs/venv_sglang/lib/python3.11/site-packages/sglang/srt/distributed/device_communicators/pynccl_wrapper.py", line 374, in NCCL_CHECK [rank1]: raise RuntimeError(f"NCCL error: {error_str}") [rank1]: RuntimeError: NCCL error: invalid usage (run with NCCL_DEBUG=WARN for details)".

Nov 28 '25 12:11 Abigbigbig

Can you try the following:

Remove the following code:

    if is_online:
        if (
            args.is_vlm
            and draft_model_config.target_model_type == "qwen2_5_vl"
            and args.tp_size == 1
        ):
            from transformers import Qwen2_5_VLForConditionalGeneration

            target_model = (
                Qwen2_5_VLForConditionalGeneration.from_pretrained(
                    pretrained_model_name_or_path=args.target_model_path,
                    torch_dtype=torch.bfloat16,
                )
                .eval()
                .cuda()
            )
        else:

fix indentation and run with --target-model-backend hf

Nov 28 '25 21:11 yubofredwang

Can you try the following:

Remove the following code:

    if is_online:
        if (
            args.is_vlm
            and draft_model_config.target_model_type == "qwen2_5_vl"
            and args.tp_size == 1
        ):
            from transformers import Qwen2_5_VLForConditionalGeneration

            target_model = (
                Qwen2_5_VLForConditionalGeneration.from_pretrained(
                    pretrained_model_name_or_path=args.target_model_path,
                    torch_dtype=torch.bfloat16,
                )
                .eval()
                .cuda()
            )
        else:

fix indentation and run with --target-model-backend hf

I tried this setting using hf backend, "if is_online: if args.target_model_backend == "sglang": target_model_kwargs = SGLangBackendArgs.from_args(args).to_kwargs() else: target_model_kwargs = {} target_model = get_eagle3_target_model( pretrained_model_name_or_path=args.target_model_path, backend=args.target_model_backend, torch_dtype=torch.bfloat16, device="cuda", cache_dir=args.cache_dir, **target_model_kwargs, )" but still couldn't train with the following error:

”warnings.warn( Set draft model tie_word_embeddings to False Missing validation function mapping in ROPE_VALIDATION_FUNCTIONS for 'rope_type'='mrope' torch_dtype is deprecated! Use dtype instead! [rank0]: Traceback (most recent call last): [rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 748, in [rank0]: main() [rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 573, in main [rank0]: target_model, processor = build_target_model(args, draft_model_config) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 258, in build_target_model [rank0]: target_model = get_eagle3_target_model( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/SpecForge/specforge/modeling/target/eagle3_target_model.py", line 475, in get_eagle3_target_model [rank0]: return HFEagle3TargetModel.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/SpecForge/specforge/modeling/target/eagle3_target_model.py", line 163, in from_pretrained [rank0]: target_model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/envs/venv_sglang/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 607, in from_pretrained [rank0]: raise ValueError( [rank0]: ValueError: Unrecognized configuration class <class 'transformers.models.qwen2_5_vl.configuration_qwen2_5_vl.Qwen2_5_VLConfig'> for this kind of AutoModel: AutoModelForCausalLM. [rank0]: Model type should be one of ApertusConfig, ArceeConfig, AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, BltConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV2Config, DeepseekV3Config, DiffLlamaConfig, DogeConfig, Dots1Config, ElectraConfig, Emu3Config, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FlexOlmoConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nTextConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, GraniteMoeSharedConfig, HeliumConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, JambaConfig, JetMoeConfig, Lfm2Config, LlamaConfig, Llama4Config, Llama4TextConfig, LongcatFlashConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MiniMaxConfig, MinistralConfig, MistralConfig, MixtralConfig, MllamaConfig, ModernBertDecoderConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, Olmo3Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, Qwen3NextConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeedOssConfig, SmolLM3Config, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, VaultGemmaConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, ZambaConfig, Zamba2Config.“

and my "transformers==4.57.1".

Nov 29 '25 09:11 Abigbigbig

Can you try the following: Remove the following code:
    if is_online:
        if (
            args.is_vlm
            and draft_model_config.target_model_type == "qwen2_5_vl"
            and args.tp_size == 1
        ):
            from transformers import Qwen2_5_VLForConditionalGeneration

            target_model = (
                Qwen2_5_VLForConditionalGeneration.from_pretrained(
                    pretrained_model_name_or_path=args.target_model_path,
                    torch_dtype=torch.bfloat16,
                )
                .eval()
                .cuda()
            )
        else:
fix indentation and run with --target-model-backend hf
I tried this setting using hf backend, "if is_online: if args.target_model_backend == "sglang": target_model_kwargs = SGLangBackendArgs.from_args(args).to_kwargs() else: target_model_kwargs = {} target_model = get_eagle3_target_model( pretrained_model_name_or_path=args.target_model_path, backend=args.target_model_backend, torch_dtype=torch.bfloat16, device="cuda", cache_dir=args.cache_dir, **target_model_kwargs, )" but still couldn't train with the following error:

”warnings.warn( Set draft model tie_word_embeddings to False Missing validation function mapping in ROPE_VALIDATION_FUNCTIONS for 'rope_type'='mrope' torch_dtype is deprecated! Use dtype instead! [rank0]: Traceback (most recent call last): [rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 748, in [rank0]: main() [rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 573, in main [rank0]: target_model, processor = build_target_model(args, draft_model_config) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 258, in build_target_model [rank0]: target_model = get_eagle3_target_model( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/SpecForge/specforge/modeling/target/eagle3_target_model.py", line 475, in get_eagle3_target_model [rank0]: return HFEagle3TargetModel.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/SpecForge/specforge/modeling/target/eagle3_target_model.py", line 163, in from_pretrained [rank0]: target_model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/envs/venv_sglang/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 607, in from_pretrained [rank0]: raise ValueError( [rank0]: ValueError: Unrecognized configuration class <class 'transformers.models.qwen2_5_vl.configuration_qwen2_5_vl.Qwen2_5_VLConfig'> for this kind of AutoModel: AutoModelForCausalLM. [rank0]: Model type should be one of ApertusConfig, ArceeConfig, AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, BltConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV2Config, DeepseekV3Config, DiffLlamaConfig, DogeConfig, Dots1Config, ElectraConfig, Emu3Config, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FlexOlmoConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nTextConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, GraniteMoeSharedConfig, HeliumConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, JambaConfig, JetMoeConfig, Lfm2Config, LlamaConfig, Llama4Config, Llama4TextConfig, LongcatFlashConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MiniMaxConfig, MinistralConfig, MistralConfig, MixtralConfig, MllamaConfig, ModernBertDecoderConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, Olmo3Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, Qwen3NextConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeedOssConfig, SmolLM3Config, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, VaultGemmaConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, ZambaConfig, Zamba2Config.“

and my "transformers==4.57.1".

When I use sglang backend instead of hf（only 1 GPU）, I will enter Epoch 0 and receive an error message：

Training Epoch 0: 0%| | 0/95000 [00:00<?, ?it/s] [rank0]: Traceback (most recent call last): [rank0]: File "/mnt/data/wangrenjun/SpecForge/scripts/train_eagle3.py", line 788, in [rank0]: main() [rank0]: File "/mnt/data/wangrenjun/SpecForge/scripts/train_eagle3.py", line 708, in main [rank0]: plosses, acces = run_forward( [rank0]: ^^^^^^^^^^^^ [rank0]: File "/mnt/data/wangrenjun/SpecForge/scripts/train_eagle3.py", line 486, in run_forward [rank0]: plosses, _, acces = eagle3_model( [rank0]: ^^^^^^^^^^^^^ [rank0]: File "/opt/conda/envs/venv_spec/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/envs/venv_spec/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/envs/venv_spec/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 854, in forward [rank0]: output = self._fsdp_wrapped_module(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/envs/venv_spec/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/envs/venv_spec/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/wangrenjun/SpecForge/specforge/core/eagle3.py", line 443, in forward [rank0]: hidden_states, target, loss_mask, input_ids = self._prepare_data( [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/envs/venv_spec/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/wangrenjun/SpecForge/specforge/core/eagle3.py", line 347, in _prepare_data [rank0]: outputs = self.target_model( [rank0]: ^^^^^^^^^^^^^^^^^^ [rank0]: TypeError: 'SGLangEagle3TargetModel' object is not callable

Nov 29 '25 13:11 Abigbigbig