beep-bebop
Results
3
issues of
beep-bebop
模型的困惑度越低,说明模型对真实数据的预测能力越强。
Is using SFT data directly as calibration data the best option? Does it cause performance fluctuations when I have more (e.g. 28w) or less (e.g. 1k) fine-tuned data? Also, would...
# What does this PR do? #31629 added `DataCollatorWithFlattening`, which packs examples in a small batch into a long sequence and uses `-100` to splice the samples and returns `position...