text-generation-webui
text-generation-webui copied to clipboard
Support for ChatGLM
Is ChatGLM very different from GLM? Is is possible to support GLM and GLM-130B together?
chatglm is a good model, sometimes It is better than chatgpt in individual fields of Chinese
I managed to get GLM-10b to work in a very hacky way. Still missing some features as it is not very well-documented.
Giving this one a bump. ChatGLM results are pretty convincing honestly. @sgsdxzy How did you manage it?
Apparently this model is quite good.
Apparently this is a text2text model and not an autoregressive model. So it's more like FLAN than GPT-J or other currently supported models.
In my experience for GLM you need format the input (append '[gMASK]') and the output (it put things between two special tokens). And the model seems to not support temperature, top_k, etc. Other than that the adaption is trivial. But I think one of the strong point of GLM is its ability to do infill. Maybe in textui we let users put [MASK], [sMASK] and [gMASK] as they please and just format the output? Have not looked into ChatGLM yet. Just cruious: GLM-10b-Chinese has 10B parameters but ChatGLM-6b has 6B, so ChatGLM is pretrained from the ground up and not a finetune of GLM? @OedoSoldier do you know any details?
Apparently this is a text2text model and not an autoregressive model. So it's more like FLAN than GPT-J or other currently supported models.
I tried mt0 and some encoder-decoder models and they won't work. So this webui will not/never support them, right?
So this webui will not/never support them, right?
Is the issue open or closed?
I set up a very basic one-click free colab web demo of ChatGLM in case anyone is itching to try it:
<-- press here to launch the web demo
Here's the original repo: https://github.com/THUDM/ChatGLM-6B/blob/main/README_en.md
In case anyone wonders, I also managed to make chatglm-6b wok with this UI in a hacky and limited way, and stream chatting works fine (although sometimes triggers strange answers mixing English/Chinese ) .
The modifications are too messy to be anywhere near a PR, but here're the tweaks:
-
pip install icetk
: fixes _ImportError: This modeling file requires the following packages that were not found in your environment: icetk -
download-model.py
need to also downloadice_text.model
: it's an unusual extension - during model loading, skip the normal process and load it with the custom code, fixing 3 issues:
- Unrecognized configuration class <class 'transformers_modules.chatglm-6b.configuration_chatglm.ChatGLMConfig'> for this kind of AutoModel: AutoModelForCausalLM.
- Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, CodeGenConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, GitConfig, GPT2Config, GPT2Config, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MvpConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, Speech2Text2Config, TransfoXLConfig, TrOCRConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.
- ValueError: Loading models/chatglm-6b requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option
trust_remote_code=True
to remove this error. - RuntimeError: mixed dtype (CPU): expect input to have scalar type of BFloat16
- modify
generate_reply
intext_generation.py
with the custom code, adapting tostream_chat()
Limitation: only tested for CPU (in docker under Linux) yet, but I assume GPU version is easy, just following the examples. Will be trying to figure out to make auto devices to work. Will need to figure out how could it work with LoRa and the rest of the features.
NOTE: trust_remote_code=True
is DANGEROUS, use with caution!
Custom messy code snippets 4:
# ChatGLM
elif shared.is_ChatGLM:
from transformers import AutoModel
params = {}
# params["torch_dtype"] = torch.float16
params["trust_remote_code"] = True
tokenizer = AutoTokenizer.from_pretrained(Path(f'models/{model_name}'), **params)
model = AutoModel.from_pretrained(Path(f"models/{shared.model_name}"), **params).float()
model = model.eval()
return model, tokenizer
note: half() doesn't work, giving: RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
Custom messy code snippets 5:
if shared.is_ChatGLM:
history = []
if not (shared.args.chat or shared.args.cai_chat):
yield formatted_outputs(question, shared.model_name)
for reply, history in shared.model.stream_chat(shared.tokenizer, question, history, max_length=max_new_tokens, top_p=top_p,
temperature=temperature):
if not (shared.args.chat or shared.args.cai_chat):
reply = original_question + apply_extensions(reply, "output")
updates = []
for query, response in history:
updates.append("User:" + query)
updates.append("ChatGLM-6B:" + response)
yield formatted_outputs(reply, shared.model_name)
Just posting to confirm the feasibility, for others to search the errors and to see if I've done something obvious wrong( for example, I keep getting weird results like the following:
Incorrect answer and mixing off-topic English:

Answering a Chinese question with mostly English and a few Chinese words (or even partial words):

Maybe this is related to using the default Character prompts or the parameters preset...
I come up with an idea of support GLM series through api #627 Any suggestions?
em.....Can anyone tell me where the setting "trust_remote_code = True" is added to the file?
any progress?
https://github.com/Ph0rk0z/text-generation-webui-testing/commit/a72ef4932c7a95a3d8de0e7eb3bc778b5424aea3
Should I make a PR? Is this thing worth it?
edit: this model is freaking weird.
Added here https://github.com/oobabooga/text-generation-webui/pull/1256
This model likes to mix Chinese with English, but it appears to be quite smart.
A special prompt format has to be used with "rounds", 问:
as the user and 答:
as the bot:
[Round 0]
问:
What is your name?
答:
I am a language model, and my name is ChatGLM.
[Round 1]
问:
What does trust_remote_code=True do in the transformers library?
答:
The `TrustRemoteCode` parameter in the `transformers` library specifies whether to trust remote code signing (RSA) or not. When set to `True`, the library will use RSA-based code signing to sign the input data before loading it into memory. This means that the library assumes that the remote party has performed the necessary steps for secure communication, such as generating a private key and using it to sign the message.
When set to `False`, the library will instead rely on local code signing by itself, which may be less secure due to the possibility of an attacker冒充 a trusted party to perform the signature operation. In general, setting this parameter to `True` should be谨慎使用, especially when communicating with potentially malicious parties.
[Round 2]
问:
What is your favorite movie?
答:
Source: https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning#%E5%AF%B9%E8%AF%9D%E6%95%B0%E6%8D%AE%E9%9B%86
NICE! Thanks for your work!
Should prioritize finding out exactly what the remote code execution is and replacing it with something safe.
According to the above, that is some kind of signature checking.
Maybe its requests can be blocked with the same trick that was used here https://github.com/oobabooga/text-generation-webui/commit/c58c1d89bdbd303d969b8156a3effe355138fa16. I'll try that later
I have done a more careful analysis and the conclusion is that the ChatGLM loader does not make any HTTP requests. All it does is read the following files inside the model folder, none of which imports the requests
library:
configuration_chatglm.py
modeling_chatglm.py
tokenization_chatglm.py
The "trust remote code" flag name is misleading, as the code in question is local if you have previously downloaded the model.
Future models may abuse this feature, but ChatGLM seems to be fine.
em.....Can anyone tell me where the setting "trust_remote_code = True" is added to the file?
python download-model.py THUDM/chatglm-6b python server.py --model THUDM_chatglm-6b --chat --trust-remote-code