fastllm icon indicating copy to clipboard operation
fastllm copied to clipboard

ChatGLM3开源了,请问作者有支持计划吗?

Open micrazy opened this issue 2 years ago • 16 comments

RT

micrazy avatar Oct 27 '23 09:10 micrazy

hope so

Pd-ch avatar Oct 27 '23 16:10 Pd-ch

这不本来就支持吗

ColorfulDick avatar Oct 30 '23 06:10 ColorfulDick

+1

ivankxt avatar Oct 31 '23 07:10 ivankxt

system多角色,如何支持?

LL020202 avatar Nov 06 '23 05:11 LL020202

+1

sunzhe09 avatar Nov 07 '23 07:11 sunzhe09

本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产prompt,正常调用就行了

def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797):
    prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n'
    for i, (old_query, response) in enumerate(history):
            prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n"
            prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n"
    prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>"
    return prompt

fushengwuyu avatar Nov 09 '23 13:11 fushengwuyu

C++ API 已支持基本的多轮对话,system指令建议按楼上方式组装prompt。

TylunasLi avatar Nov 09 '23 16:11 TylunasLi

本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产prompt,正常调用就行了

def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797):
    prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n'
    for i, (old_query, response) in enumerate(history):
            prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n"
            prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n"
    prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>"
    return prompt

输出结果和原来对不齐,结果不一致。 自己构造了条数据text = "<FLM_FIX_TOKEN_64795>\n1+1=\n<FLM_FIX_TOKEN_64796>\n2\n<FLM_FIX_TOKEN_64795>\n为什么\n<FLM_FIX_TOKEN_64796>" 一个输出结果是“\n 1+1=2,因为1和1相加得到2。” 原始的chat是“1+1=2,这是基本的数学原理之一,被称为加法。在十进制数字系统中,数字从0到9,当我们把1和1相加时,得到2。这个结果是由两个个位数字相加得出的,其中1表示一个单位,2表示两个单位。因此,当我们在数字1+1中相加1和1时,得到2。 ”

Cloopen-ReLiNK avatar Nov 13 '23 11:11 Cloopen-ReLiNK

https://github.com/Arcment/chatglm3-composite-demo-modified.git 我这个是基于ChatGLM3官方demo改的,今天加了fastllm。速度提了不少

Arcment avatar Nov 26 '23 14:11 Arcment

本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产提示,正常调用就行了

def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797):
    prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n'
    for i, (old_query, response) in enumerate(history):
            prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n"
            prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n"
    prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>"
    return prompt

输出结果和原来对不齐,结果不一致。 自己构造了条Datatext = "<FLM_FIX_TOKEN_64795>\n1+1=\n<FLM_FIX_TOKEN_64796>\n2\n<FLM_FIX_TOKEN_64795>\n为什么\n<FLM_FIX_TOKEN_64796>" 一个输出结果是“\n 1+1=2,因为1和1相加得到2。” 原始的聊天是“1+1=2,这是基本的数学原理之一,被称为加法。在十个数字系统中中中,数字从0到9,当我们把1和1相加时,得到2。这个结果是由下面两个位数字相加的,其中1表示一个单位,2表示两个单位。因此,当我们在数字1+1中相加1和1时,得到2。 ”

我也碰到一样的问题,请问解决了吗?

JinXuan0604 avatar Dec 06 '23 06:12 JinXuan0604

转换代码: import time, torch, os from transformers import AutoModel, AutoTokenizer from fastllm_pytools import llm

model_path = "chatglm3-5e-4-30-2_1128_export" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda() model = model.eval() prompt = "小米办公打不开怎么办" out, _ = model.chat(tokenizer, prompt, history=[]) print(out) new_model = llm.from_hf(model, tokenizer, dtype="float16") torch.cuda.empty_cache() new_model.save("model_lora.flm")

转换后推理代码: from fastllm_pytools import llm from transformers import AutoModel, AutoTokenizer

def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt

model = llm.model("model_lora.flm") model.direct_query = True query = "小米办公打不开怎么办" prompt = glm_prompt3(query) res= model.response(prompt) res = res.split("\n", maxsplit=1)[1] print(res)

经过fastllm转换后的chatglm3-6b-base模型使用chat输出为空,使用response无法与转换前的结果对齐

JinXuan0604 avatar Dec 06 '23 07:12 JinXuan0604

本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产prompt,正常调用就行了

def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797):
    prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n'
    for i, (old_query, response) in enumerate(history):
            prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n"
            prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n"
    prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>"
    return prompt

prompt写的有问题,下面的写法才是对的 def glm_prompt(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n {query}<FLM_FIX_TOKEN_{assistant_token}>" return prompt

JinXuan0604 avatar Dec 07 '23 09:12 JinXuan0604

转换代码: import time, torch, os from transformers import AutoModel, AutoTokenizer from fastllm_pytools import llm

model_path = "chatglm3-5e-4-30-2_1128_export" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda() model = model.eval() prompt = "小米办公打不开怎么办" out, _ = model.chat(tokenizer, prompt, history=[]) print(out) new_model = llm.from_hf(model, tokenizer, dtype="float16") torch.cuda.empty_cache() new_model.save("model_lora.flm")

转换后推理代码: from fastllm_pytools import llm from transformers import AutoModel, AutoTokenizer

def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt

model = llm.model("model_lora.flm") model.direct_query = True query = "小米办公打不开怎么办" prompt = glm_prompt3(query) res= model.response(prompt) res = res.split("\n", maxsplit=1)[1] print(res)

经过fastllm转换后的chatglm3-6b-base模型使用chat输出为空,使用response无法与转换前的结果对齐

base模型这个我也遇到了,没想到解决办法

Arcment avatar Dec 15 '23 12:12 Arcment

本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产prompt,正常调用就行了

def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797):
    prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n'
    for i, (old_query, response) in enumerate(history):
            prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n"
            prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n"
    prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>"
    return prompt

prompt写的有问题,下面的写法才是对的 def glm_prompt(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n {query}<FLM_FIX_TOKEN_{assistant_token}>" return prompt

你这个不是相当于没有用system的吗?

aofengdaxia avatar Jan 18 '24 12:01 aofengdaxia

转换代码: import time, torch, os from transformers import AutoModel, AutoTokenizer from fastllm_pytools import llm

model_path = "chatglm3-5e-4-30-2_1128_export" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda() model = model.eval() prompt = "小米办公打不开怎么办" out, _ = model.chat(tokenizer, prompt, history=[]) print(out) new_model = llm.from_hf(model, tokenizer, dtype="float16") torch.cuda.empty_cache() new_model.save("model_lora.flm")

转换后推理代码: from fastllm_pytools import llm from transformers import AutoModel, AutoTokenizer

def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt

model = llm.model("model_lora.flm") model.direct_query = True query = "小米办公打不开怎么办" prompt = glm_prompt3(query) res= model.response(prompt) res = res.split("\n", maxsplit=1)[1] print(res)

经过fastllm转换后的chatglm3-6b-base模型使用chat输出为空,使用response无法与转换前的结果对齐

我也存在这个问题,用fastllm后,输出和没使用fastllm时不一样,你解决了没?

@TylunasLi,大佬看下这个怎么改一下

chenyangjun45 avatar Mar 06 '24 02:03 chenyangjun45

转换代码:

import time, torch, os from transformers 
import AutoModel, AutoTokenizer from fastllm_pytools import llm

model_path = "chatglm3-5e-4-30-2_1128_export" 
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()
model = model.eval()
prompt = "小米办公打不开怎么办"
out, _ = model.chat(tokenizer, prompt, history=[])
print(out) 
new_model = llm.from_hf(model, tokenizer, dtype="float16") 
torch.cuda.empty_cache() 
new_model.save("model_lora.flm")

转换后推理代码:

from fastllm_pytools import llm 

def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797):
     prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n'
     for i, (old_query, response) in enumerate(history):
          prompt += f"<FLM_FIX_TOKEN_{user_token}> \n{old_query}\n"
          prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n"
      prompt += f"<FLM_FIX_TOKEN_{user_token}> \n{query}\n<FLM_FIX_TOKEN_{assistant_token}>"
      return prompt

model = llm.model("model_lora.flm") 
model.direct_query = True
query = "小米办公打不开怎么办"
prompt = glm_prompt3(query)
res = model.response(prompt) 
res = res.split("\n", maxsplit=1)[1]
print(res)

经过fastllm转换后的chatglm3-6b-base模型使用chat输出为空,使用response无法与转换前的结果对齐

@JinXuan0604 @Arcment @chenyangjun45

可以尝试比对转换前后输入的input ids序列是否一致。

base微调模型的prompt模板是你自己微调时候的模板,可能不是用glm3的。

查看fastllm tokenize结果:

    model = llm.model("model_lora.flm") 
    model.direct_query = True
    ids = model.tokenizer_encode_string(make_prompt(query, history))
    print(ids)
    for id in ids:
        try:
            print(model.tokenizer_decode_token(id), end = ' ')
        except:
            pass
    print()

TylunasLi avatar Mar 06 '24 04:03 TylunasLi