@Abigbigbig This looks like a different issue from this PR. Let's move to a different issue. I can point you the fix
@Abigbigbig This looks like a different issue from this PR. Let's move to a different issue. I can point you the fix
Originally posted by @yubofredwang in https://github.com/sgl-project/SpecForge/issues/314#issuecomment-3588281081 Thank you for your answer,I have now reverted back to specforge 0.1.0 and sglang 0.5.4, and used your previous solution to add "kernel_options={ "BLOCK_M": 32, "BLOCK_N": 32, "BLOCK_M1": 32, "BLOCK_N1": 32, "BLOCK_M2": 32, "BLOCK_N2": 32, } attn_output = flex_attention_func( query=query_states, key=key_cache.contiguous(), value=value_cache.contiguous(), block_mask=block_mask, enable_gqa=True, kernel_options=kernel_options, )”Solved the compilation issue of 'OutOfCacheError: out of resource: triton-tem_fused_0 Required: 107008 Hardware limit: 101376', but there was insufficient video memory during training. I am using two 48GB A6000 graphics cards. Can this model of graphics card meet the requirements. The specific question is "[rank1]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.02 GiB. GPU 1 has a total capacity of 47.40 GiB of which 30.69 GiB is free. Process 583936 has 29.95 GiB memory in use. Process 1818617 has 16.71 GiB memory in use. Of the allocated memory 16.25 GiB is allocated by PyTorch, and 204.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)".
was this using sglang for target backend or huggingface?
was this using sglang for target backend or huggingface?
I didn't modify the specforge source code, I trained directly using “examples/run_qwen2_5-vl_eagle3_online.sh”,“#!/bin/bash
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) ROOT_DIR=$(dirname $SCRIPT_DIR)
support tp1 train eagle3 for qwen2.5-vl-7b-instruct
NUM_GPUS=${1:-1}
torchrun
--standalone
--nproc_per_node $NUM_GPUS
$ROOT_DIR/scripts/train_eagle3_online.py
--target-model-path Qwen/Qwen2.5-VL-7B-Instruct
--draft-model-config $ROOT_DIR/configs/qwen2-5-vl-eagle3.json
--train-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl
--output-dir $ROOT_DIR/outputs/Qwen2.5-VL-7B-eagle3
--num-epochs 10
--batch-size 1
--learning-rate 1e-4
--max-length 8192
--dist-timeout 360
--chat-template qwen2-vl
--cache-dir $ROOT_DIR/cache
--embedding-key model.embed_tokens.weight
--tp-size 1
--is-vlm
--min-pixels 50176
--max-pixels 802816”
was this using sglang for target backend or huggingface?
I didn't modify the specforge source code, I trained directly using “examples/run_qwen2_5-vl_eagle3_online.sh”,“#!/bin/bash
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) ROOT_DIR=$(dirname $SCRIPT_DIR)
support tp1 train eagle3 for qwen2.5-vl-7b-instruct
NUM_GPUS=${1:-1}
torchrun --standalone --nproc_per_node $NUM_GPUS $ROOT_DIR/scripts/train_eagle3_online.py --target-model-path Qwen/Qwen2.5-VL-7B-Instruct --draft-model-config $ROOT_DIR/configs/qwen2-5-vl-eagle3.json --train-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl --output-dir $ROOT_DIR/outputs/Qwen2.5-VL-7B-eagle3 --num-epochs 10 --batch-size 1 --learning-rate 1e-4 --max-length 8192 --dist-timeout 360 --chat-template qwen2-vl --cache-dir $ROOT_DIR/cache --embedding-key model.embed_tokens.weight --tp-size 1 --is-vlm --min-pixels 50176 --max-pixels 802816”
I tried to set tp_Size==2 and run the training using "bash examples/run_qwen2_5-vl_eagle3_online.sh 2", so that "sglang for target backend" could be used, but it encountered a conflict.
Specific error was,
"WARNING:sglang.srt.server_args:Cuda graph is disabled because of using torch native attention backend
torch_dtype is deprecated! Use dtype instead!
WARNING:sglang.srt.server_args:Cuda graph is disabled because of using torch native attention backend
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank [Gloo] Rank 01 is connected to is connected to 11 peer ranks. peer ranks. Expected number of connected peer ranks is : Expected number of connected peer ranks is : 11
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/data/wangrenjun/SpecForge/scripts/train_eagle3_online.py", line 748, in
Can you try the following:
Remove the following code:
if is_online:
if (
args.is_vlm
and draft_model_config.target_model_type == "qwen2_5_vl"
and args.tp_size == 1
):
from transformers import Qwen2_5_VLForConditionalGeneration
target_model = (
Qwen2_5_VLForConditionalGeneration.from_pretrained(
pretrained_model_name_or_path=args.target_model_path,
torch_dtype=torch.bfloat16,
)
.eval()
.cuda()
)
else:
fix indentation and run with --target-model-backend hf
Can you try the following:
Remove the following code:
if is_online: if ( args.is_vlm and draft_model_config.target_model_type == "qwen2_5_vl" and args.tp_size == 1 ): from transformers import Qwen2_5_VLForConditionalGeneration target_model = ( Qwen2_5_VLForConditionalGeneration.from_pretrained( pretrained_model_name_or_path=args.target_model_path, torch_dtype=torch.bfloat16, ) .eval() .cuda() ) else:fix indentation and run with
--target-model-backend hf
I tried this setting using hf backend, "if is_online: if args.target_model_backend == "sglang": target_model_kwargs = SGLangBackendArgs.from_args(args).to_kwargs() else: target_model_kwargs = {} target_model = get_eagle3_target_model( pretrained_model_name_or_path=args.target_model_path, backend=args.target_model_backend, torch_dtype=torch.bfloat16, device="cuda", cache_dir=args.cache_dir, **target_model_kwargs, )" but still couldn't train with the following error:
”warnings.warn(
Set draft model tie_word_embeddings to False
Missing validation function mapping in ROPE_VALIDATION_FUNCTIONS for 'rope_type'='mrope'
torch_dtype is deprecated! Use dtype instead!
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 748, in
and my "transformers==4.57.1".
Can you try the following: Remove the following code:
if is_online: if ( args.is_vlm and draft_model_config.target_model_type == "qwen2_5_vl" and args.tp_size == 1 ): from transformers import Qwen2_5_VLForConditionalGeneration target_model = ( Qwen2_5_VLForConditionalGeneration.from_pretrained( pretrained_model_name_or_path=args.target_model_path, torch_dtype=torch.bfloat16, ) .eval() .cuda() ) else:fix indentation and run with
--target-model-backend hfI tried this setting using hf backend, "if is_online: if args.target_model_backend == "sglang": target_model_kwargs = SGLangBackendArgs.from_args(args).to_kwargs() else: target_model_kwargs = {} target_model = get_eagle3_target_model( pretrained_model_name_or_path=args.target_model_path, backend=args.target_model_backend, torch_dtype=torch.bfloat16, device="cuda", cache_dir=args.cache_dir, **target_model_kwargs, )" but still couldn't train with the following error:
”warnings.warn( Set draft model tie_word_embeddings to False Missing validation function mapping in
ROPE_VALIDATION_FUNCTIONSfor 'rope_type'='mrope'torch_dtypeis deprecated! Usedtypeinstead! [rank0]: Traceback (most recent call last): [rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 748, in [rank0]: main() [rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 573, in main [rank0]: target_model, processor = build_target_model(args, draft_model_config) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/SpecForge/scripts/train_eagle3_online.py", line 258, in build_target_model [rank0]: target_model = get_eagle3_target_model( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/SpecForge/specforge/modeling/target/eagle3_target_model.py", line 475, in get_eagle3_target_model [rank0]: return HFEagle3TargetModel.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/mnt/data/SpecForge/specforge/modeling/target/eagle3_target_model.py", line 163, in from_pretrained [rank0]: target_model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/envs/venv_sglang/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 607, in from_pretrained [rank0]: raise ValueError( [rank0]: ValueError: Unrecognized configuration class <class 'transformers.models.qwen2_5_vl.configuration_qwen2_5_vl.Qwen2_5_VLConfig'> for this kind of AutoModel: AutoModelForCausalLM. [rank0]: Model type should be one of ApertusConfig, ArceeConfig, AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, BltConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV2Config, DeepseekV3Config, DiffLlamaConfig, DogeConfig, Dots1Config, ElectraConfig, Emu3Config, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FlexOlmoConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nTextConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, GraniteMoeSharedConfig, HeliumConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, JambaConfig, JetMoeConfig, Lfm2Config, LlamaConfig, Llama4Config, Llama4TextConfig, LongcatFlashConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MiniMaxConfig, MinistralConfig, MistralConfig, MixtralConfig, MllamaConfig, ModernBertDecoderConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, Olmo3Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, Qwen3NextConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeedOssConfig, SmolLM3Config, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, VaultGemmaConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, ZambaConfig, Zamba2Config.“and my "transformers==4.57.1".
When I use sglang backend instead of hf(only 1 GPU), I will enter Epoch 0 and receive an error message:
Training Epoch 0: 0%| | 0/95000 [00:00<?, ?it/s]
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/data/wangrenjun/SpecForge/scripts/train_eagle3.py", line 788, in