BELLE 开源的BELLE/train/main.py只支持指令跟随数据集，不支持类似sharegpt多轮对话，不能复现论文。

开源的BELLE/train/main.py只支持指令跟随数据集，不支持类似sharegpt多轮对话，不能复现论文。

Open baibaiw5 opened this issue 1 year ago • 2 comments

你好，https://github.com/LianjiaTech/BELLE/blob/main/train/reproduce_our_papers/Towards%20Better%20Instruction%20Following%20Language%20Models%20for%20Chinese:%20Investigating%20the%20Impact%20of%20Training%20Data%20and%20Evaluation.md 里面提到可以使用 https://github.com/LianjiaTech/BELLE/blob/main/train/README.md 里面的main.py进行复现。

我看了BELLE\train\utils\data\raw_datasets.py文件中，对数据集的处理方式只有指令跟随。

没有对上述对话的处理方式。想问下多轮对话的数据处理方式是什么？

Apr 24 '23 10:04 baibaiw5

你好，https://github.com/LianjiaTech/BELLE/blob/main/train/reproduce_our_papers/Towards%20Better%20Instruction%20Following%20Language%20Models%20for%20Chinese:%20Investigating%20the%20Impact%20of%20Training%20Data%20and%20Evaluation.md 里面提到可以使用 https://github.com/LianjiaTech/BELLE/blob/main/train/README.md 里面的main.py进行复现。

我看了BELLE\train\utils\data\raw_datasets.py文件中，对数据集的处理方式只有指令跟随。

没有对上述对话的处理方式。想问下多轮对话的数据处理方式是什么？

是的，我们目前还不支持多轮对话的数据处理方式。非常抱歉，最迟明天更新多轮对话的处理逻辑。感谢您的关注。

Apr 24 '23 10:04 xianghuisun

你好，https://github.com/LianjiaTech/BELLE/blob/main/train/reproduce_our_papers/Towards%20Better%20Instruction%20Following%20Language%20Models%20for%20Chinese:%20Investigating%20the%20Impact%20of%20Training%20Data%20and%20Evaluation.md 里面提到可以使用 https://github.com/LianjiaTech/BELLE/blob/main/train/README.md 里面的main.py进行复现。

我看了BELLE\train\utils\data\raw_datasets.py文件中，对数据集的处理方式只有指令跟随。

没有对上述对话的处理方式。想问下多轮对话的数据处理方式是什么？

代码更新后会及时通知您。

Apr 24 '23 10:04 xianghuisun

你好，https://github.com/LianjiaTech/BELLE/blob/main/train/reproduce_our_papers/Towards%20Better%20Instruction%20Following%20Language%20Models%20for%20Chinese:%20Investigating%20the%20Impact%20of%20Training%20Data%20and%20Evaluation.md 里面提到可以使用 https://github.com/LianjiaTech/BELLE/blob/main/train/README.md 里面的main.py进行复现。

我看了BELLE\train\utils\data\raw_datasets.py文件中，对数据集的处理方式只有指令跟随。

没有对上述对话的处理方式。想问下多轮对话的数据处理方式是什么？

当前代码已支持多轮对话数据的训练。你可采用shareGPT数据，详见train/README.md

Apr 25 '23 13:04 xianghuisun

BELLE BELLE copied to clipboard

开源的BELLE/train/main.py只支持指令跟随数据集，不支持类似sharegpt多轮对话，不能复现论文。

BELLE
BELLE copied to clipboard