text-generation-webui Support for ChatGLM

Description

ChatGLM-6B

A Chinese chat AI based on GLM was released by THU.

Mar 14 '23 11:03 OedoSoldier

Is ChatGLM very different from GLM? Is is possible to support GLM and GLM-130B together?

Mar 14 '23 14:03 sgsdxzy

chatglm is a good model, sometimes It is better than chatgpt in individual fields of Chinese

Mar 16 '23 13:03 kakaxixx

I managed to get GLM-10b to work in a very hacky way. Still missing some features as it is not very well-documented.

Mar 16 '23 13:03 sgsdxzy

Giving this one a bump. ChatGLM results are pretty convincing honestly. @sgsdxzy How did you manage it?

Mar 18 '23 07:03 0xbitches

Apparently this model is quite good.

Mar 19 '23 23:03 oobabooga

Apparently this is a text2text model and not an autoregressive model. So it's more like FLAN than GPT-J or other currently supported models.

Mar 19 '23 23:03 oobabooga

In my experience for GLM you need format the input (append '[gMASK]') and the output (it put things between two special tokens). And the model seems to not support temperature, top_k, etc. Other than that the adaption is trivial. But I think one of the strong point of GLM is its ability to do infill. Maybe in textui we let users put [MASK], [sMASK] and [gMASK] as they please and just format the output? Have not looked into ChatGLM yet. Just cruious: GLM-10b-Chinese has 10B parameters but ChatGLM-6b has 6B, so ChatGLM is pretrained from the ground up and not a finetune of GLM? @OedoSoldier do you know any details?

Mar 20 '23 04:03 sgsdxzy

Apparently this is a text2text model and not an autoregressive model. So it's more like FLAN than GPT-J or other currently supported models.

I tried mt0 and some encoder-decoder models and they won't work. So this webui will not/never support them, right?

Mar 21 '23 05:03 nameless0704

So this webui will not/never support them, right?

Is the issue open or closed?

Mar 21 '23 12:03 oobabooga

I set up a very basic one-click free colab web demo of ChatGLM in case anyone is itching to try it: <-- press here to launch the web demo

Here's the original repo: https://github.com/THUDM/ChatGLM-6B/blob/main/README_en.md

Mar 23 '23 01:03 MarkSchmidty

In case anyone wonders, I also managed to make chatglm-6b wok with this UI in a hacky and limited way, and stream chatting works fine (although sometimes triggers strange answers mixing English/Chinese ) .

The modifications are too messy to be anywhere near a PR, but here're the tweaks:

pip install icetk: fixes _ImportError: This modeling file requires the following packages that were not found in your environment: icetk
download-model.py need to also download ice_text.model : it's an unusual extension
during model loading, skip the normal process and load it with the custom code, fixing 3 issues:

Unrecognized configuration class <class 'transformers_modules.chatglm-6b.configuration_chatglm.ChatGLMConfig'> for this kind of AutoModel: AutoModelForCausalLM.
- Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, CodeGenConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, GitConfig, GPT2Config, GPT2Config, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MvpConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, Speech2Text2Config, TransfoXLConfig, TrOCRConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.
ValueError: Loading models/chatglm-6b requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.
RuntimeError: mixed dtype (CPU): expect input to have scalar type of BFloat16

modify generate_reply in text_generation.py with the custom code, adapting to stream_chat()

Limitation: only tested for CPU (in docker under Linux) yet, but I assume GPU version is easy, just following the examples. Will be trying to figure out to make auto devices to work. Will need to figure out how could it work with LoRa and the rest of the features.

NOTE: trust_remote_code=True is DANGEROUS, use with caution!

Custom messy code snippets 4:

    # ChatGLM
    elif shared.is_ChatGLM:
        from transformers import AutoModel
        params = {}
        # params["torch_dtype"] = torch.float16 
        params["trust_remote_code"] = True
        tokenizer = AutoTokenizer.from_pretrained(Path(f'models/{model_name}'), **params)
        model = AutoModel.from_pretrained(Path(f"models/{shared.model_name}"), **params).float()

        model = model.eval()

        return model, tokenizer

note: half() doesn't work, giving: RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Custom messy code snippets 5:

    if shared.is_ChatGLM:
        history = []

        if not (shared.args.chat or shared.args.cai_chat):
            yield formatted_outputs(question, shared.model_name)
        
        for reply, history in shared.model.stream_chat(shared.tokenizer, question, history, max_length=max_new_tokens, top_p=top_p,
                                                temperature=temperature):
            if not (shared.args.chat or shared.args.cai_chat):
                reply = original_question + apply_extensions(reply, "output")
            
            updates = []
            for query, response in history:
                updates.append("User：" + query)
                updates.append("ChatGLM-6B：" + response)
            
            yield formatted_outputs(reply, shared.model_name)

Just posting to confirm the feasibility, for others to search the errors and to see if I've done something obvious wrong( for example, I keep getting weird results like the following:

Incorrect answer and mixing off-topic English:

Answering a Chinese question with mostly English and a few Chinese words (or even partial words):

Maybe this is related to using the default Character prompts or the parameters preset...

Mar 29 '23 06:03 utensil

I come up with an idea of support GLM series through api #627 Any suggestions?

Mar 29 '23 06:03 sgsdxzy

em.....Can anyone tell me where the setting "trust_remote_code = True" is added to the file?

Apr 04 '23 02:04 Pythonpa

any progress?

Apr 10 '23 04:04 binary-husky

https://github.com/Ph0rk0z/text-generation-webui-testing/commit/a72ef4932c7a95a3d8de0e7eb3bc778b5424aea3

Should I make a PR? Is this thing worth it?

edit: this model is freaking weird.

Apr 16 '23 01:04 Ph0rk0z

Added here https://github.com/oobabooga/text-generation-webui/pull/1256

This model likes to mix Chinese with English, but it appears to be quite smart.

chat

A special prompt format has to be used with "rounds", 问: as the user and 答: as the bot:

[Round 0]
问:
What is your name?
答:
I am a language model, and my name is ChatGLM.
[Round 1]
问:
What does trust_remote_code=True do in the transformers library?
答:
The `TrustRemoteCode` parameter in the `transformers` library specifies whether to trust remote code signing (RSA) or not. When set to `True`, the library will use RSA-based code signing to sign the input data before loading it into memory. This means that the library assumes that the remote party has performed the necessary steps for secure communication, such as generating a private key and using it to sign the message.

When set to `False`, the library will instead rely on local code signing by itself, which may be less secure due to the possibility of an attacker冒充 a trusted party to perform the signature operation. In general, setting this parameter to `True` should be谨慎使用, especially when communicating with potentially malicious parties.
[Round 2]
问:
What is your favorite movie?
答:

Source: https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning#%E5%AF%B9%E8%AF%9D%E6%95%B0%E6%8D%AE%E9%9B%86

Apr 16 '23 22:04 oobabooga

NICE! Thanks for your work!

Apr 17 '23 15:04 OedoSoldier

Should prioritize finding out exactly what the remote code execution is and replacing it with something safe.

Apr 18 '23 08:04 mcmonkey4eva

According to the above, that is some kind of signature checking.

Apr 18 '23 12:04 Ph0rk0z

Maybe its requests can be blocked with the same trick that was used here https://github.com/oobabooga/text-generation-webui/commit/c58c1d89bdbd303d969b8156a3effe355138fa16. I'll try that later

Apr 18 '23 14:04 oobabooga

I have done a more careful analysis and the conclusion is that the ChatGLM loader does not make any HTTP requests. All it does is read the following files inside the model folder, none of which imports the requests library:

configuration_chatglm.py
modeling_chatglm.py
tokenization_chatglm.py

The "trust remote code" flag name is misleading, as the code in question is local if you have previously downloaded the model.

Future models may abuse this feature, but ChatGLM seems to be fine.

Apr 18 '23 19:04 oobabooga

em.....Can anyone tell me where the setting "trust_remote_code = True" is added to the file?

python download-model.py THUDM/chatglm-6b python server.py --model THUDM_chatglm-6b --chat --trust-remote-code

Apr 20 '23 12:04 shibing624

text-generation-webui text-generation-webui copied to clipboard

Support for ChatGLM

text-generation-webui
text-generation-webui copied to clipboard