paulcx
paulcx
> Hard to really tell without specific dataset info, training procedure, and the model parameter count BUT: > > > > I can't speak for your other attempts but this...
> > That's right. I'm trying finetuning. I knew pretraining and Lora finetuning works as expected. I just wonder if anyone have same issue. Does that mean one epoch is...
not yet
> python3 -m torch.distributed.launch --nproc_per_node 4 \ > > --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy finetune.py --dataset_path data/alpaca --lora_rank 8 --per_device_train_batch_size 2 --gradient_accumulation_steps 1 --max_steps 10000 --save_steps 1000 --save_total_limit 2 --learning_rate 2e-5...
> 没有 用的torch的ddp,加两三行代码就搞定 > > 你可以试试,如果好不行,我应该会在明后天搞一个pr上来 deepspeed怎么试?
I guess something I can quickly fix by add a specific mlp2x_gelu_Norm projector class (not good solution at all): ``` class YiLlavaMultiModalProjector(nn.Module): def __init__(self, config): super().__init__() self.projector = nn.Sequential( nn.Linear(config.mm_hidden_size,...
That's right. Yi uses ``` IMAGE_TOKEN_INDEX = -200 DEFAULT_IMAGE_TOKEN = "" ```
same issue found and any fix? @Narsil
这个问题不仅是Qwen1.5,Qwen2.5也同样会出现OOM情况,例如Qwen2.5-32B完全相同的参数配置,Yi-1.5-34B都可以全参数微调,但是32B模型即使一般的长度都会出现oom。