hadi comments

Results 12 comments of


                                            hadi

多进程跑出现loss为0，eval loss为nan

@airaria @ymcui 能帮忙看看上个问题大概是什么原因吗？7b的时候微调padding_side改成right后loss和eval loss正常，但是13b的时候loss和eval loss全是0

> Hi, you may test out our strategies such as GeminiDDP as illustrated in the [examples](https://github.com/hpcaitech/ColossalAI/blob/d0fbd4b86fcfa653db5c5b7d312f249ce6dad619/applications/Chat/examples/train_sft.py#L34). This is the training method we used. The memory usage is too large, unusable....

用lora微调时eval loss为nan

@airaria 运行的参数如下： nohup python finetune.py --base_model '/data/llama-test/merge_chinese_lora_alpaca_plus_7b' --data_path './data/concat_datasets.json' --output_dir "./plus-7b-output/alpaca-plus-7b-test-001" --batch_size 96 --micro_batch_size 32 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 512 --val_set_size 5000 --lora_r 8 --lora_alpha 16 --lora_dropout 0.1 --lora_target_modules...

用lora微调时eval loss为nan

> tokenizer.padding_side 改成'right'，可能和这个有关 @airaria 感谢大佬快速回复，等我们改下再试试

用lora微调时eval loss为nan

> > > > 您好，您也是使用alpaca-plus在alpaca-lora的微调代码上进行的吗？训练完成之后怎么合并的呢是的，不过还没训练完。合并的话，可以参考这个吧，三个一起弄 python scripts/merge_llama_with_chinese_lora.py \ --base_model path_to_original_llama_hf_dir \ --lora_model path_to_chinese_llama_plus_lora,path_to_chinese_alpaca_plus_lora \ --output_type [pth|huggingface] --output_dir path_to_output_dir

用lora微调时eval loss为nan

> > tokenizer.padding_side 改成'right'，可能和这个有关 > > @airaria 感谢大佬快速回复，等我们改下再试试改成right后验证集有loss了，但是loss下降的有点慢，最终loss的合理值是多少？ ![image](https://user-images.githubusercontent.com/19610534/236965780-5699c3de-3a29-47ba-8d6b-6d562086f84c.png) 另外还要请教一个问题，alpaca-lora里面用的是int8，如果将load_in_8bit=True改成False后就会报错，错误如下：Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! model =...

用lora微调时eval loss为nan

> 这个loss我看来算是正常，至少整个流程上没问题，不同domain的数据的有差别，2以下都是正常的。如果你还是觉得loss高，那需要靠你自己调模型训练、调超参等“炼丹”技术了 > > `load_in_8bit=False`后，`model = prepare_model_for_int8_training(model)`这句也要删掉吧？当时这句也删除了，还是报那个错：Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!。是不是device_map那里要改下，我用的auto，这个需要改下吗？改成cpu或者GPU？ @airaria

用lora微调时eval loss为nan

> > > 这个loss我看来算是正常，至少整个流程上没问题，不同domain的数据的有差别，2以下都是正常的。如果你还是觉得loss高，那需要靠你自己调模型训练、调超参等“炼丹”技术了 > > > `load_in_8bit=False`后，`model = prepare_model_for_int8_training(model)`这句也要删掉吧？ > > > > > > 当时这句也删除了，还是报那个错：Expected all tensors to be on the same device, but found at least...

以后的训练中会增加max length吗

> 原版llama最大长度是2048。后续可能会考虑扩展到这么长，但也要顾及算力方面，所以无法保证一定会发布。了解。多谢解答。 @ymcui

如何根据不同的GPU显存来设置batch_size

> pipeline并行 @irasin 感谢答复。我们现在就是这种处理的，跑了一个进程，模型分到三个卡上，但是batch_size该怎么设置让不同的卡跑不同的batch_size？