LMFlow what's the difference between text_only and text2text data type?

你们好，我自己有一份私有大模型，预训练的时候，输入和输出之间的分隔符是"[SEP]"，输出终止token是“<eod>”。现在我想借助lmflow进行微调。我注意到，数据格式只能是text_only 和text2text，请问text_only 和text2text 在使用上有什么区别？

我应该如何构造我的数据集，才能使用你们的微调和推理脚本呢？

比如，我的数据格式是：

问：能否帮我写一个python代码判断一个数字是否是偶数？答：is_even = lambda x: x % 2 == 0

Sep 13 '23 08:09 cauwulixuan

感谢关注，text_only适合无标签数据，text2text适合有标签的问答对

针对您提供的例子，可以转换成：

{
  "type": "text2text",
  "instances": [
    {
        "input": "能否帮我写一个python代码判断一个数字是否是偶数？",
        "output": "is_even = lambda x: x % 2 == 0",
    },
  ]
}

具体请参考教程 https://optimalscale.github.io/LMFlow/examples/DATASETS.html

Sep 19 '23 21:09 shizhediao

@shizhediao 请问在文本数据方面，是否有对更多数据集字段（如 query，history）等、以及更多文件格式的支持的打算呢

Sep 22 '23 07:09 shmily326

感谢关注，text_only适合无标签数据，text2text适合有标签的问答对

针对您提供的例子，可以转换成：
{
  "type": "text2text",
  "instances": [
    {
        "input": "能否帮我写一个python代码判断一个数字是否是偶数？",
        "output": "is_even = lambda x: x % 2 == 0",
    },
  ]
}
具体请参考教程 https://optimalscale.github.io/LMFlow/examples/DATASETS.html

谢谢。请问官方的微信群还能加入吗？我看readme里的二维码链接失效了。

Sep 26 '23 00:09 cauwulixuan

@shizhediao 请问在文本数据方面，是否有对更多数据集字段（如 query，history）等、以及更多文件格式的支持的打算呢

您好，更多的字段可以通过拼接整合，使用input字段来进行输入，比如现在支持的多轮对话。谢谢

Sep 30 '23 19:09 shizhediao

感谢关注，text_only适合无标签数据，text2text适合有标签的问答对针对您提供的例子，可以转换成：
{
  "type": "text2text",
  "instances": [
    {
        "input": "能否帮我写一个python代码判断一个数字是否是偶数？",
        "output": "is_even = lambda x: x % 2 == 0",
    },
  ]
}
具体请参考教程 https://optimalscale.github.io/LMFlow/examples/DATASETS.html
谢谢。请问官方的微信群还能加入吗？我看readme里的二维码链接失效了。

已更新，谢谢

Sep 30 '23 19:09 shizhediao

感谢关注，text_only适合无标签数据，text2text适合有标签的问答对针对您提供的例子，可以转换成：
{
  "type": "text2text",
  "instances": [
    {
        "input": "能否帮我写一个python代码判断一个数字是否是偶数？",
        "output": "is_even = lambda x: x % 2 == 0",
    },
  ]
}
具体请参考教程 https://optimalscale.github.io/LMFlow/examples/DATASETS.html
谢谢。请问官方的微信群还能加入吗？我看readme里的二维码链接失效了。
已更新，谢谢

微信二维码还是显示“图片未找到”，方便放在这个issue里吗？

Oct 09 '23 03:10 cauwulixuan

您好，您可以试试这个更新后的链接社区群wechat。

Hi, you may try the updated QR code to see if it works, thanks!

Oct 09 '23 07:10 research4pan

您好，您可以试试这个更新后的链接社区群wechat。

Hi, you may try the updated QR code to see if it works, thanks!

这个有效，已加入，谢谢~

Oct 10 '23 02:10 cauwulixuan

LMFlow LMFlow copied to clipboard

what's the difference between text_only and text2text data type?

LMFlow
LMFlow copied to clipboard