Shipeng Wang
Shipeng Wang
Hi, In your paper, you said the "next-utterance classification" task and "language modeling" task were trained in a multi-task learning setting, and also in train.py, there is a function load...
I am currently working on experiments of DPO and KTO Trainer on private dataset. I am considering using gradient checkpointing to reduce memory usage during backpropagation, but I am unsure...
训练qwen1.5-14b-chat,遇到下面的报错,transformers==4.38.2 RuntimeError( "Unsloth: Tokenizer's pad_token cannot be = eos_token, and we couldn't find a\n"\ "replacement of either