paulcx comments

Results 51 comments of


                                            paulcx

llm finetuning is overfitting?

> Hard to really tell without specific dataset info, training procedure, and the model parameter count BUT: > > > > I can't speak for your other attempts but this...

llm finetuning is overfitting?

> > That's right. I'm trying finetuning. I knew pretraining and Lora finetuning works as expected. I just wonder if anyone have same issue. Does that mean one epoch is...

> python3 -m torch.distributed.launch --nproc_per_node 4 \ > > --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy finetune.py --dataset_path data/alpaca --lora_rank 8 --per_device_train_batch_size 2 --gradient_accumulation_steps 1 --max_steps 10000 --save_steps 1000 --save_total_limit 2 --learning_rate 2e-5...

怎么使用多卡进行训练？

> 没有用的torch的ddp，加两三行代码就搞定 > > 你可以试试，如果好不行，我应该会在明后天搞一个pr上来 deepspeed怎么试？

Extraction from 2 column text, marker mixes left and right colum text paragraphs.

same issue here.

Support Yi-VL-6B/34B

I guess something I can quickly fix by add a specific mlp2x_gelu_Norm projector class (not good solution at all): ``` class YiLlavaMultiModalProjector(nn.Module): def __init__(self, config): super().__init__() self.projector = nn.Sequential( nn.Linear(config.mm_hidden_size,...

paulcx

llm finetuning is overfitting?

llm finetuning is overfitting?

overfit and why?

怎么使用多卡进行训练？

怎么使用多卡进行训练？

Extraction from 2 column text, marker mixes left and right colum text paragraphs.

Support Yi-VL-6B/34B

Support Yi-VL-6B/34B

Llava Next crashes on certain image sizes

qwen1.5-32b训练大概需要多少显存？我是用8卡A800，长度2k和4k都会显存溢出。但Yi-34B是可以全参数训练