text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Support for ChatGLM

Open OedoSoldier opened this issue 1 year ago • 15 comments

Description

ChatGLM-6B

A Chinese chat AI based on GLM was released by THU.

OedoSoldier avatar Mar 14 '23 11:03 OedoSoldier

Is ChatGLM very different from GLM? Is is possible to support GLM and GLM-130B together?

sgsdxzy avatar Mar 14 '23 14:03 sgsdxzy

chatglm is a good model, sometimes It is better than chatgpt in individual fields of Chinese

kakaxixx avatar Mar 16 '23 13:03 kakaxixx

I managed to get GLM-10b to work in a very hacky way. Still missing some features as it is not very well-documented.

sgsdxzy avatar Mar 16 '23 13:03 sgsdxzy

Giving this one a bump. ChatGLM results are pretty convincing honestly. @sgsdxzy How did you manage it?

0xbitches avatar Mar 18 '23 07:03 0xbitches

Apparently this model is quite good.

oobabooga avatar Mar 19 '23 23:03 oobabooga

Apparently this is a text2text model and not an autoregressive model. So it's more like FLAN than GPT-J or other currently supported models.

oobabooga avatar Mar 19 '23 23:03 oobabooga

In my experience for GLM you need format the input (append '[gMASK]') and the output (it put things between two special tokens). And the model seems to not support temperature, top_k, etc. Other than that the adaption is trivial. But I think one of the strong point of GLM is its ability to do infill. Maybe in textui we let users put [MASK], [sMASK] and [gMASK] as they please and just format the output? Have not looked into ChatGLM yet. Just cruious: GLM-10b-Chinese has 10B parameters but ChatGLM-6b has 6B, so ChatGLM is pretrained from the ground up and not a finetune of GLM? @OedoSoldier do you know any details?

sgsdxzy avatar Mar 20 '23 04:03 sgsdxzy

Apparently this is a text2text model and not an autoregressive model. So it's more like FLAN than GPT-J or other currently supported models.

I tried mt0 and some encoder-decoder models and they won't work. So this webui will not/never support them, right?

nameless0704 avatar Mar 21 '23 05:03 nameless0704

So this webui will not/never support them, right?

Is the issue open or closed?

oobabooga avatar Mar 21 '23 12:03 oobabooga

I set up a very basic one-click free colab web demo of ChatGLM in case anyone is itching to try it: Launch In Colab <-- press here to launch the web demo

Here's the original repo: https://github.com/THUDM/ChatGLM-6B/blob/main/README_en.md

MarkSchmidty avatar Mar 23 '23 01:03 MarkSchmidty

In case anyone wonders, I also managed to make chatglm-6b wok with this UI in a hacky and limited way, and stream chatting works fine (although sometimes triggers strange answers mixing English/Chinese ) .

The modifications are too messy to be anywhere near a PR, but here're the tweaks:

  1. pip install icetk: fixes _ImportError: This modeling file requires the following packages that were not found in your environment: icetk
  2. download-model.py need to also download ice_text.model : it's an unusual extension
  3. during model loading, skip the normal process and load it with the custom code, fixing 3 issues:
  • Unrecognized configuration class <class 'transformers_modules.chatglm-6b.configuration_chatglm.ChatGLMConfig'> for this kind of AutoModel: AutoModelForCausalLM.
    • Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, CodeGenConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, GitConfig, GPT2Config, GPT2Config, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MvpConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, Speech2Text2Config, TransfoXLConfig, TrOCRConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.
  • ValueError: Loading models/chatglm-6b requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.
  • RuntimeError: mixed dtype (CPU): expect input to have scalar type of BFloat16
  1. modify generate_reply in text_generation.py with the custom code, adapting to stream_chat()

Limitation: only tested for CPU (in docker under Linux) yet, but I assume GPU version is easy, just following the examples. Will be trying to figure out to make auto devices to work. Will need to figure out how could it work with LoRa and the rest of the features.

NOTE: trust_remote_code=True is DANGEROUS, use with caution!

Custom messy code snippets 4:

    # ChatGLM
    elif shared.is_ChatGLM:
        from transformers import AutoModel
        params = {}
        # params["torch_dtype"] = torch.float16 
        params["trust_remote_code"] = True
        tokenizer = AutoTokenizer.from_pretrained(Path(f'models/{model_name}'), **params)
        model = AutoModel.from_pretrained(Path(f"models/{shared.model_name}"), **params).float()

        model = model.eval()

        return model, tokenizer

note: half() doesn't work, giving: RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Custom messy code snippets 5:

    if shared.is_ChatGLM:
        history = []

        if not (shared.args.chat or shared.args.cai_chat):
            yield formatted_outputs(question, shared.model_name)
        
        for reply, history in shared.model.stream_chat(shared.tokenizer, question, history, max_length=max_new_tokens, top_p=top_p,
                                                temperature=temperature):
            if not (shared.args.chat or shared.args.cai_chat):
                reply = original_question + apply_extensions(reply, "output")
            
            updates = []
            for query, response in history:
                updates.append("User:" + query)
                updates.append("ChatGLM-6B:" + response)
            
            yield formatted_outputs(reply, shared.model_name)

Just posting to confirm the feasibility, for others to search the errors and to see if I've done something obvious wrong( for example, I keep getting weird results like the following:

Incorrect answer and mixing off-topic English:

image

Answering a Chinese question with mostly English and a few Chinese words (or even partial words):

image

Maybe this is related to using the default Character prompts or the parameters preset...

utensil avatar Mar 29 '23 06:03 utensil

I come up with an idea of support GLM series through api #627 Any suggestions?

sgsdxzy avatar Mar 29 '23 06:03 sgsdxzy

em.....Can anyone tell me where the setting "trust_remote_code = True" is added to the file?

Pythonpa avatar Apr 04 '23 02:04 Pythonpa

any progress?

binary-husky avatar Apr 10 '23 04:04 binary-husky

https://github.com/Ph0rk0z/text-generation-webui-testing/commit/a72ef4932c7a95a3d8de0e7eb3bc778b5424aea3

Should I make a PR? Is this thing worth it?

edit: this model is freaking weird.

Ph0rk0z avatar Apr 16 '23 01:04 Ph0rk0z

Added here https://github.com/oobabooga/text-generation-webui/pull/1256

This model likes to mix Chinese with English, but it appears to be quite smart.

chat

A special prompt format has to be used with "rounds", 问: as the user and 答: as the bot:

[Round 0]
问:
What is your name?
答:
I am a language model, and my name is ChatGLM.
[Round 1]
问:
What does trust_remote_code=True do in the transformers library?
答:
The `TrustRemoteCode` parameter in the `transformers` library specifies whether to trust remote code signing (RSA) or not. When set to `True`, the library will use RSA-based code signing to sign the input data before loading it into memory. This means that the library assumes that the remote party has performed the necessary steps for secure communication, such as generating a private key and using it to sign the message.

When set to `False`, the library will instead rely on local code signing by itself, which may be less secure due to the possibility of an attacker冒充 a trusted party to perform the signature operation. In general, setting this parameter to `True` should be谨慎使用, especially when communicating with potentially malicious parties.
[Round 2]
问:
What is your favorite movie?
答:

Source: https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning#%E5%AF%B9%E8%AF%9D%E6%95%B0%E6%8D%AE%E9%9B%86

oobabooga avatar Apr 16 '23 22:04 oobabooga

NICE! Thanks for your work!

OedoSoldier avatar Apr 17 '23 15:04 OedoSoldier

Should prioritize finding out exactly what the remote code execution is and replacing it with something safe.

mcmonkey4eva avatar Apr 18 '23 08:04 mcmonkey4eva

According to the above, that is some kind of signature checking.

Ph0rk0z avatar Apr 18 '23 12:04 Ph0rk0z

Maybe its requests can be blocked with the same trick that was used here https://github.com/oobabooga/text-generation-webui/commit/c58c1d89bdbd303d969b8156a3effe355138fa16. I'll try that later

oobabooga avatar Apr 18 '23 14:04 oobabooga

I have done a more careful analysis and the conclusion is that the ChatGLM loader does not make any HTTP requests. All it does is read the following files inside the model folder, none of which imports the requests library:

configuration_chatglm.py
modeling_chatglm.py
tokenization_chatglm.py

The "trust remote code" flag name is misleading, as the code in question is local if you have previously downloaded the model.

Future models may abuse this feature, but ChatGLM seems to be fine.

oobabooga avatar Apr 18 '23 19:04 oobabooga

em.....Can anyone tell me where the setting "trust_remote_code = True" is added to the file?

python download-model.py THUDM/chatglm-6b python server.py --model THUDM_chatglm-6b --chat --trust-remote-code

shibing624 avatar Apr 20 '23 12:04 shibing624