Chinese-LLaMA-Alpaca 请教一下, 预训练节点, eos和bos是怎么添加到训练文本里面的?

具体是谁来做的? 是文本预处理的时候就添加好(在每篇文章开头和结尾增加eos/bos), 还是代码里面自动做的?这里一直没太弄明白, 多谢!

May 17 '23 06:05 ruanshudong

我理解是在文章预处理时就在头部和尾部添加好bos/eos, 但是[PAD]是什么用? 似乎用不上?

May 17 '23 08:05 ruanshudong

tokenizer自动处理。不需要对预训练数据特殊处理。

May 17 '23 11:05 airaria

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

May 24 '23 22:05 github-actions[bot]

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

May 28 '23 22:05 github-actions[bot]