fastllm
fastllm copied to clipboard
ChatGLM3开源了,请问作者有支持计划吗?
RT
hope so
这不本来就支持吗
+1
system多角色,如何支持?
+1
本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产prompt,正常调用就行了
def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797):
prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n'
for i, (old_query, response) in enumerate(history):
prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n"
prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n"
prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>"
return prompt
C++ API 已支持基本的多轮对话,system指令建议按楼上方式组装prompt。
本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产prompt,正常调用就行了
def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt
输出结果和原来对不齐,结果不一致。 自己构造了条数据text = "<FLM_FIX_TOKEN_64795>\n1+1=\n<FLM_FIX_TOKEN_64796>\n2\n<FLM_FIX_TOKEN_64795>\n为什么\n<FLM_FIX_TOKEN_64796>" 一个输出结果是“\n 1+1=2,因为1和1相加得到2。” 原始的chat是“1+1=2,这是基本的数学原理之一,被称为加法。在十进制数字系统中,数字从0到9,当我们把1和1相加时,得到2。这个结果是由两个个位数字相加得出的,其中1表示一个单位,2表示两个单位。因此,当我们在数字1+1中相加1和1时,得到2。 ”
https://github.com/Arcment/chatglm3-composite-demo-modified.git 我这个是基于ChatGLM3官方demo改的,今天加了fastllm。速度提了不少
本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产提示,正常调用就行了
def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt输出结果和原来对不齐,结果不一致。 自己构造了条Datatext = "<FLM_FIX_TOKEN_64795>\n1+1=\n<FLM_FIX_TOKEN_64796>\n2\n<FLM_FIX_TOKEN_64795>\n为什么\n<FLM_FIX_TOKEN_64796>" 一个输出结果是“\n 1+1=2,因为1和1相加得到2。” 原始的聊天是“1+1=2,这是基本的数学原理之一,被称为加法。在十个数字系统中中中,数字从0到9,当我们把1和1相加时,得到2。这个结果是由下面两个位数字相加的,其中1表示一个单位,2表示两个单位。因此,当我们在数字1+1中相加1和1时,得到2。 ”
我也碰到一样的问题,请问解决了吗?
转换代码: import time, torch, os from transformers import AutoModel, AutoTokenizer from fastllm_pytools import llm
model_path = "chatglm3-5e-4-30-2_1128_export" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda() model = model.eval() prompt = "小米办公打不开怎么办" out, _ = model.chat(tokenizer, prompt, history=[]) print(out) new_model = llm.from_hf(model, tokenizer, dtype="float16") torch.cuda.empty_cache() new_model.save("model_lora.flm")
转换后推理代码: from fastllm_pytools import llm from transformers import AutoModel, AutoTokenizer
def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt
model = llm.model("model_lora.flm") model.direct_query = True query = "小米办公打不开怎么办" prompt = glm_prompt3(query) res= model.response(prompt) res = res.split("\n", maxsplit=1)[1] print(res)
经过fastllm转换后的chatglm3-6b-base模型使用chat输出为空,使用response无法与转换前的结果对齐
本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产prompt,正常调用就行了
def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt
prompt写的有问题,下面的写法才是对的 def glm_prompt(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n {query}<FLM_FIX_TOKEN_{assistant_token}>" return prompt
转换代码: import time, torch, os from transformers import AutoModel, AutoTokenizer from fastllm_pytools import llm
model_path = "chatglm3-5e-4-30-2_1128_export" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda() model = model.eval() prompt = "小米办公打不开怎么办" out, _ = model.chat(tokenizer, prompt, history=[]) print(out) new_model = llm.from_hf(model, tokenizer, dtype="float16") torch.cuda.empty_cache() new_model.save("model_lora.flm")
转换后推理代码: from fastllm_pytools import llm from transformers import AutoModel, AutoTokenizer
def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt
model = llm.model("model_lora.flm") model.direct_query = True query = "小米办公打不开怎么办" prompt = glm_prompt3(query) res= model.response(prompt) res = res.split("\n", maxsplit=1)[1] print(res)
经过fastllm转换后的chatglm3-6b-base模型使用chat输出为空,使用response无法与转换前的结果对齐
base模型这个我也遇到了,没想到解决办法
本来就是支持的啊,使用的时候设置model.direct_query = True,然后自己根据chatglm3的规则生产prompt,正常调用就行了
def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return promptprompt写的有问题,下面的写法才是对的 def glm_prompt(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n {query}<FLM_FIX_TOKEN_{assistant_token}>" return prompt
你这个不是相当于没有用system的吗?
转换代码: import time, torch, os from transformers import AutoModel, AutoTokenizer from fastllm_pytools import llm
model_path = "chatglm3-5e-4-30-2_1128_export" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda() model = model.eval() prompt = "小米办公打不开怎么办" out, _ = model.chat(tokenizer, prompt, history=[]) print(out) new_model = llm.from_hf(model, tokenizer, dtype="float16") torch.cuda.empty_cache() new_model.save("model_lora.flm")
转换后推理代码: from fastllm_pytools import llm from transformers import AutoModel, AutoTokenizer
def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}>\n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt
model = llm.model("model_lora.flm") model.direct_query = True query = "小米办公打不开怎么办" prompt = glm_prompt3(query) res= model.response(prompt) res = res.split("\n", maxsplit=1)[1] print(res)
经过fastllm转换后的chatglm3-6b-base模型使用chat输出为空,使用response无法与转换前的结果对齐
我也存在这个问题,用fastllm后,输出和没使用fastllm时不一样,你解决了没?
@TylunasLi,大佬看下这个怎么改一下
转换代码:
import time, torch, os from transformers import AutoModel, AutoTokenizer from fastllm_pytools import llm model_path = "chatglm3-5e-4-30-2_1128_export" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda() model = model.eval() prompt = "小米办公打不开怎么办" out, _ = model.chat(tokenizer, prompt, history=[]) print(out) new_model = llm.from_hf(model, tokenizer, dtype="float16") torch.cuda.empty_cache() new_model.save("model_lora.flm")转换后推理代码:
from fastllm_pytools import llm def glm_prompt3(query: str, history=[], system_prompt='', user_token=64795, system_token=64794, assistant_token=64796, observation=64797): prompt = f'<FLM_FIX_TOKEN_{system_token}>\n{system_prompt}\n' for i, (old_query, response) in enumerate(history): prompt += f"<FLM_FIX_TOKEN_{user_token}> \n{old_query}\n" prompt += f"<FLM_FIX_TOKEN_{assistant_token}>\n{response}\n" prompt += f"<FLM_FIX_TOKEN_{user_token}> \n{query}\n<FLM_FIX_TOKEN_{assistant_token}>" return prompt model = llm.model("model_lora.flm") model.direct_query = True query = "小米办公打不开怎么办" prompt = glm_prompt3(query) res = model.response(prompt) res = res.split("\n", maxsplit=1)[1] print(res)经过fastllm转换后的chatglm3-6b-base模型使用chat输出为空,使用response无法与转换前的结果对齐
@JinXuan0604 @Arcment @chenyangjun45
可以尝试比对转换前后输入的input ids序列是否一致。
base微调模型的prompt模板是你自己微调时候的模板,可能不是用glm3的。
查看fastllm tokenize结果:
model = llm.model("model_lora.flm")
model.direct_query = True
ids = model.tokenizer_encode_string(make_prompt(query, history))
print(ids)
for id in ids:
try:
print(model.tokenizer_decode_token(id), end = ' ')
except:
pass
print()