ChatGLM-6B [Feature] <title> 我最近也在拿官方代码finetune chatglm6b, 代码里面有history. 有多轮对话能力, 现在数据和你的数据都是单轮对话. 我打算生成一些chatglm训练集格式的多轮对话数据. 这样效果更好, 代码可以看下面. 确实有history部分.

[Feature] <title> 我最近也在拿官方代码finetune chatglm6b, 代码里面有history. 有多轮对话能力, 现在数据和你的数据都是单轮对话. 我打算生成一些chatglm训练集格式的多轮对话数据. 这样效果更好, 代码可以看下面. 确实有history部分.

Open zhangbo2008 opened this issue 2 years ago • 6 comments

Is your feature request related to a problem? Please describe.

def preprocess_function_eval(examples):
    inputs, targets = [], []
    for i in range(len(examples[prompt_column])):
        if examples[prompt_column][i] and examples[response_column][i]:
            query = examples[prompt_column][i]
            if history_column is None or len(examples[history_column][i]) == 0:
                prompt = query
            else:
                prompt = ""
                history = examples[history_column][i]
                for turn_idx, (old_query, response) in enumerate(history):
                    prompt += "[Round {}]\n问：{}\n答：{}\n".format(turn_idx, old_query, response)
                prompt += "[Round {}]\n问：{}\n答：".format(len(history), query)
            inputs.append(prompt)
            targets.append(examples[response_column][i])

Solutions

def preprocess_function_eval(examples):
    inputs, targets = [], []
    for i in range(len(examples[prompt_column])):
        if examples[prompt_column][i] and examples[response_column][i]:
            query = examples[prompt_column][i]
            if history_column is None or len(examples[history_column][i]) == 0:
                prompt = query
            else:
                prompt = ""
                history = examples[history_column][i]
                for turn_idx, (old_query, response) in enumerate(history):
                    prompt += "[Round {}]\n问：{}\n答：{}\n".format(turn_idx, old_query, response)
                prompt += "[Round {}]\n问：{}\n答：".format(len(history), query)
            inputs.append(prompt)
            targets.append(examples[response_column][i])

Additional context

No response

May 19 '23 09:05 zhangbo2008

mark

May 19 '23 09:05 MurrayC7

不明所以

May 19 '23 09:05 bsmarcebte733

有人试过觉得咋样

May 21 '23 09:05 zhangatao

mark

May 25 '23 15:05 ztfmars

有没有考虑过对话是奇数的？就是一问一答的形式，聊着聊着，后面不理你了，变成了奇数

Jun 01 '23 08:06 qxde01

没试过，但是确实有这种数据。我找的京东客服对话数据多伦的，经常最后我说一句客户就不回答了，挺难刷格式的因为数据不区分说话人。。最后我训的时候把奇数的删了。

| | @.*** | | @.*** |

---- 回复的原邮件 ---- | 发件人 | @.> | | 日期 | 2023年06月01日 16:06 | | 收件人 | @.> | | 抄送至 | @.>@.> | | 主题 | Re: [THUDM/ChatGLM-6B] [Feature]

我最近也在拿官方代码finetune chatglm6b, 代码里面有history. 有多轮对话能力, 现在数据和你的数据都是单轮对话. 我打算生成一些chatglm训练集格式的多轮对话数据. 这样效果更好, 代码可以看下面. 确实有history部分. (Issue #1061) | 有没有考虑过对话是奇数的？就是一问一答的形式，聊着聊着，后面不理你了，变成了奇数 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Jun 01 '23 12:06 zhangbo2008

ChatGLM-6B ChatGLM-6B copied to clipboard

[Feature] <title> 我最近也在拿官方代码finetune chatglm6b, 代码里面有history. 有多轮对话能力, 现在数据和你的数据都是单轮对话. 我打算生成一些chatglm训练集格式的多轮对话数据. 这样效果更好, 代码可以看下面. 确实有history部分.

Is your feature request related to a problem? Please describe.

Solutions

Additional context

ChatGLM-6B
ChatGLM-6B copied to clipboard