ABHIJEET KALE comments

Results 8 comments of


                                            ABHIJEET KALE

Progress Tracker

SFT全参数微调qwen3-vl-4b-instruct，开启deepspeed报错assert all_grad_norm>0

Solution 1: Adjust Training Hyperparameters Modify your training configuration: yaml ### train per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-5 # Increased from 1.0e-6 num_train_epochs: 30.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true...

SFT全参数微调qwen3-vl-4b-instruct，开启deepspeed报错assert all_grad_norm>0

Solution 2: Use a Different DeepSpeed Configuration Create a custom DeepSpeed config instead of using ds_z0_config.json: custom_ds_config.json: json { "train_batch_size": 8, "train_micro_batch_size_per_gpu": 1, "gradient_accumulation_steps": 8, "gradient_clipping": 1.0, "zero_optimization": { "stage":...

SFT全参数微调qwen3-vl-4b-instruct，开启deepspeed报错assert all_grad_norm>0

Solution 3: Alternative DeepSpeed Config (Stage 1) If Stage 0 doesn't work, try Stage 1 with more conservative settings: ds_stage1_config.json: json { "train_batch_size": 8, "train_micro_batch_size_per_gpu": 1, "gradient_accumulation_steps": 8, "gradient_clipping": 1.0,...

SFT全参数微调qwen3-vl-4b-instruct，开启deepspeed报错assert all_grad_norm>0

Solution 4: Modify Training Approach Update your training configuration with these stability improvements: yaml ### model model_name_or_path: ./model/hf/Qwen3-VL-4B-Instruct image_max_pixels: 512000 # Reduced to prevent memory issues ### method stage: sft...

ABHIJEET KALE

Progress Tracker

SFT全参数微调qwen3-vl-4b-instruct，开启deepspeed报错assert all_grad_norm>0

SFT全参数微调qwen3-vl-4b-instruct，开启deepspeed报错assert all_grad_norm>0

SFT全参数微调qwen3-vl-4b-instruct，开启deepspeed报错assert all_grad_norm>0

SFT全参数微调qwen3-vl-4b-instruct，开启deepspeed报错assert all_grad_norm>0

SFT全参数微调qwen3-vl-4b-instruct，开启deepspeed报错assert all_grad_norm>0

使用分布式训练是不是没生成loss图？

Bump nokogiri from 1.13.9 to 1.14.3