beep-bebop

Results 3 issues of beep-bebop

模型的困惑度越低,说明模型对真实数据的预测能力越强。

Is using SFT data directly as calibration data the best option? Does it cause performance fluctuations when I have more (e.g. 28w) or less (e.g. 1k) fine-tuned data? Also, would...

# What does this PR do? #31629 added `DataCollatorWithFlattening`, which packs examples in a small batch into a long sequence and uses `-100` to splice the samples and returns `position...