ColossalAI [FEATURE]: ChatGLM model support

Describe the feature

ChatGLM-6B is a new Chinese chatGPT alternative. The weights are open sourced and we need to support tuning the model under Chat application. I've already implemented in my branch peft, later I'll ask for a PR along with another feature.

Apr 14 '23 10:04 yynil

Hi @yynil Thank you very much for contribution. Looking forward to your PR merge. Thanks.

Apr 18 '23 06:04 binmakeswell

感谢大佬，这里记录我在跑这份代码的时候遇到的几个问题，可能都与我的环境有关。

加载模型时，有2处需要.half()
reward model 训练时 loss在第一个batch后变为nan，需要backward时 model.float() forward时再 model.half()

Apr 21 '23 01:04 suc16

感谢大佬，这里记录我在跑这份代码的时候遇到的几个问题，可能都与我的环境有关。

加载模型时，有2处需要.half()

reward model 训练时 loss在第一个batch后变为nan，需要backward时 model.float() forward时再 model.half()

你可以去我的fork，去最新的分支Add_GLMChat，我重构了代码，并且把GLM自己的（bs，1，seq，seq）的attention mask加进去了，训练时候，critic默认把use action设置为False，loss下降的更符合GLM的风格（下降很迅速）

Apr 21 '23 14:04 yynil

感谢大佬，这里记录我在跑这份代码的时候遇到的几个问题，可能都与我的环境有关。

加载模型时，有2处需要.half()

reward model 训练时 loss在第一个batch后变为nan，需要backward时 model.float() forward时再 model.half()

你可以去我的fork，去最新的分支Add_GLMChat，我重构了代码，并且把GLM自己的（bs，1，seq，seq）的attention mask加进去了，训练时候，critic默认把use action设置为False，loss下降的更符合GLM的风格（下降很迅速）

感谢大佬，感谢，我去试试。

Apr 22 '23 12:04 suc16

感谢大佬，这里记录我在跑这份代码的时候遇到的几个问题，可能都与我的环境有关。

加载模型时，有2处需要.half()

reward model 训练时 loss在第一个batch后变为nan，需要backward时 model.float() forward时再 model.half()

你可以去我的fork，去最新的分支Add_GLMChat，我重构了代码，并且把GLM自己的（bs，1，seq，seq）的attention mask加进去了，训练时候，critic默认把use action设置为False，loss下降的更符合GLM的风格（下降很迅速）

大佬，我看了最新的分支，有一个关于数据集的疑问，准备数据集是还是以你main分支中的readme方法先执行easy_dataset吗？

Apr 23 '23 06:04 KevinFan0

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Thank you, big brother, here are a few problems I encountered when running this code, which may be related to my environment.

When loading the model, there are 2 places that need .half()

reward model training loss becomes nan after the first batch, model.float() when backward is needed, and model.half() when forward

You can go to my fork, go to the latest branch Add_GLMChat, I refactored the code, and added GLM's own (bs, 1, seq, seq) attention mask into it. During training, the critic sets the use action by default For False, the loss is more in line with the style of GLM (the decline is very rapid)

Boss, I read the latest branch, and I have a question about the dataset, should I execute easy_dataset first with the readme method in your main branch to prepare the dataset?

Apr 23 '23 06:04 Issues-translate-bot

是的。否则sentencepiece做tokenize太慢，无法接受Sent from my iPhoneOn Apr 23, 2023, at 14:21, KevinFan0 @.***> wrote:

感谢大佬，这里记录我在跑这份代码的时候遇到的几个问题，可能都与我的环境有关。

加载模型时，有2处需要.half() reward model 训练时 loss在第一个batch后变为nan，需要backward时 model.float() forward时再 model.half()

你可以去我的fork，去最新的分支Add_GLMChat，我重构了代码，并且把GLM自己的（bs，1，seq，seq）的attention mask加进去了，训练时候，critic默认把use action设置为False，loss下降的更符合GLM的风格（下降很迅速）

大佬，我看了最新的分支，有一个关于数据集的疑问，准备数据集是还是以你main分支中的readme方法先执行easy_dataset吗？

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

Apr 23 '23 06:04 yynil

是的。否则sentencepiece做tokenize太慢，无法接受Sent from my iPhoneOn Apr 23, 2023, at 14:21, KevinFan0 @.> wrote: 感谢大佬，这里记录我在跑这份代码的时候遇到的几个问题，可能都与我的环境有关。加载模型时，有2处需要.half() reward model 训练时 loss在第一个batch后变为nan，需要backward时 model.float() forward时再 model.half() 你可以去我的fork，去最新的分支Add_GLMChat，我重构了代码，并且把GLM自己的（bs，1，seq，seq）的attention mask加进去了，训练时候，critic默认把use action设置为False，loss下降的更符合GLM的风格（下降很迅速）大佬，我看了最新的分支，有一个关于数据集的疑问，准备数据集是还是以你main分支中的readme方法先执行easy_dataset吗？ —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> 按照main分支readme中执行easy_dataset.py时有报错，AttributeError: Can't pickle local object 'EasySFTDataset.init..mp_function'，请问sft.txt的数据格式是什么样的呢

Apr 23 '23 06:04 KevinFan0

就是一行行文本，增加domain knowledge。比如教科书等Sent from my iPhoneOn Apr 23, 2023, at 14:49, KevinFan0 @.***> wrote:

是的。否则sentencepiece做tokenize太慢，无法接受Sent from my iPhoneOn Apr 23, 2023, at 14:21, KevinFan0 @.> wrote: 感谢大佬，这里记录我在跑这份代码的时候遇到的几个问题，可能都与我的环境有关。加载模型时，有2处需要.half() reward model 训练时 loss在第一个batch后变为nan，需要backward时 model.float() forward时再 model.half() 你可以去我的fork，去最新的分支Add_GLMChat，我重构了代码，并且把GLM自己的（bs，1，seq，seq）的attention mask加进去了，训练时候，critic默认把use action设置为False，loss下降的更符合GLM的风格（下降很迅速）大佬，我看了最新的分支，有一个关于数据集的疑问，准备数据集是还是以你main分支中的readme方法先执行easy_dataset吗？ —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> 按照main分支readme中执行easy_dataset.py时有报错，AttributeError: Can't pickle local object 'EasySFTDataset.init..mp_function'，请问sft.txt的数据格式是什么样的呢

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

Apr 23 '23 06:04 yynil

@yynil 嗨，大佬，我这边follow你这边用colossalai支持chatglm训练的代码，在sft和reward model训练的时候都没有问题，但是我在执行'sh train_prompts.sh'的时候，会在脚本train_peft_prompts.py的160行处报错， ‘(actor, actor_optim), (critic, critic_optim) = strategy.prepare((actor, actor_optim), (critic, critic_optim))’ 具体报错：*** RuntimeError: torch.cat(): expected a non-empty list of Tensors 这里我有点不太清楚，希望大佬指点一下。

Apr 28 '23 02:04 lyzKF

因为你的actor和critic没有设置成train，或者没有load进来，你打印一下trainable paramters，数量应该是0.

Apr 28 '23 05:04 yynil

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Because your actor and critic are not set to train, or are not loaded, you can print the trainable parameters, the number should be 0.

Apr 28 '23 05:04 Issues-translate-bot

@yynil

act_num, cri_num = 0,0 for name,para in actor.named_parameters(): if para.requires_grad: print(name) act_nuum +=1 for name,para in critic.named_parameters(): if para.requires_grad: print(name) cri_num +=1 print(act_num, cri_num)

act_num为0 cri_num为58 actor、critic模型都加载进来了，actor、critic模型的word_embeddings权重如下： critic.model.base_model.model.transformer.layers[1].input_layernorm.weight ' tensor([0.9771, 0.9907, 0.9707, ..., 0.9849, 1.0127, 1.0049], dtype=torch.float16) ' actor.model.base_model.model.transformer.layers[1].input_layernorm.weight ' tensor([0.9771, 0.9907, 0.9707, ..., 0.9849, 1.0127, 1.0049], dtype=torch.float16) ' actor这里用了low rank adapter，它的模型参数应该是不可训练的 critic这里的模型参数都是可以训练的。

这里是不是需要关掉LoRA？

Apr 28 '23 06:04 lyzKF

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

@ionil

Apr 28 '23 06:04 Issues-translate-bot

同问

May 05 '23 07:05 lfxuyufei

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

The same question

May 05 '23 07:05 Issues-translate-bot

就是一行行文本，增加domain knowledge。比如教科书等Sent from my iPhoneOn Apr 23, 2023, at 14:49, KevinFan0 @.> wrote: 是的。否则sentencepiece做tokenize太慢，无法接受Sent from my iPhoneOn Apr 23, 2023, at 14:21, KevinFan0 @.> wrote: 感谢大佬，这里记录我在跑这份代码的时候遇到的几个问题，可能都与我的环境有关。加载模型时，有2处需要.half() reward model 训练时 loss在第一个batch后变为nan，需要backward时 model.float() forward时再 model.half() 你可以去我的fork，去最新的分支Add_GLMChat，我重构了代码，并且把GLM自己的（bs，1，seq，seq）的attention mask加进去了，训练时候，critic默认把use action设置为False，loss下降的更符合GLM的风格（下降很迅速）大佬，我看了最新的分支，有一个关于数据集的疑问，准备数据集是还是以你main分支中的readme方法先执行easy_dataset吗？ —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> 按照main分支readme中执行easy_dataset.py时有报错，AttributeError: Can't pickle local object 'EasySFTDataset.init..mp_function'，请问sft.txt的数据格式是什么样的呢 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.>

Can't pickle local object 'EasySFTDataset.init..mp_function' 请问这个错怎么解决呢？

May 05 '23 16:05 wangxiaobo007

你可以去我的fork，去最新的分支Add_GLMChat，我重构了代码，并且把GLM自己的（bs，1，seq，seq）的attention mask加进去了，训练时候，critic默认把use action设置为False，loss下降的更符合GLM的风格（下降很迅速）

请问（bs，1，seq，seq）的attention mask是必须的么，我看chatglm官方的代码里面是没有需要使用attention mask的

May 06 '23 02:05 zhangyuanscall

有没有attention mask，chatglm输出结果截然不同

2023年5月6日 10:02，zhangyuanscall @.***> 写道：

你可以去我的fork，去最新的分支Add_GLMChat，我重构了代码，并且把GLM自己的（bs，1，seq，seq）的attention mask加进去了，训练时候，critic默认把use action设置为False，loss下降的更符合GLM的风格（下降很迅速）

请问（bs，1，seq，seq）的attention mask是必须的么，我看chatglm官方的代码里面是没有需要使用attention mask的

— Reply to this email directly, view it on GitHub https://github.com/hpcaitech/ColossalAI/issues/3565#issuecomment-1536982498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3RI5Z7YZQQGWWB6O3W6A3XEWWNDANCNFSM6AAAAAAW6HTCLM. You are receiving this because you were mentioned.

May 06 '23 02:05 yynil

有没有attention mask，chatglm输出结果截然不同

可以说说原因么

May 06 '23 03:05 zhangyuanscall

哈喽楼主，问下在运行train_prompts.sh的时候，出现因为actor_optim和critic_optim中可优化的参数为空，导致zero的策略中flatten的操作报错的情况，因为flatten的参数不能为空。

我在train_peft_prompts.py文件里把optimizer的参数都设置requires_grad=True也不行。

May 11 '23 10:05 created-Bi

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Hello, the landlord, when running train_prompts.sh, because the parameters that can be optimized in actor_optim and critic_optim are empty, the flatten operation in the zero strategy reports an error, because the flatten parameter cannot be empty.

I set the optimizer parameters to requires_grad=True in the train_peft_prompts.py file.

May 11 '23 10:05 Issues-translate-bot

@yynil

act_num, cri_num = 0,0 for name,para in actor.named_parameters(): if para.requires_grad: print(name) act_nuum +=1 for name,para in critic.named_parameters(): if para.requires_grad: print(name) cri_num +=1 print(act_num, cri_num)

act_num为0 cri_num为58 actor、critic模型都加载进来了，actor、critic模型的word_embeddings权重如下： critic.model.base_model.model.transformer.layers[1].input_layernorm.weight ' tensor([0.9771, 0.9907, 0.9707, ..., 0.9849, 1.0127, 1.0049], dtype=torch.float16) ' actor.model.base_model.model.transformer.layers[1].input_layernorm.weight ' tensor([0.9771, 0.9907, 0.9707, ..., 0.9849, 1.0127, 1.0049], dtype=torch.float16) ' actor这里用了low rank adapter，它的模型参数应该是不可训练的 critic这里的模型参数都是可以训练的。

这里是不是需要关掉LoRA？

请问这个问题有解决吗？怎么解决的呢？

May 12 '23 01:05 created-Bi

@yynil This error occurs since lora models are not properly loaded due to a version mismatch of peft.

The lora model is saved as *.safetensors, instead of *.bin.
PeftModel.from_pretrained() has a “is_trainable” parameter.

The codes for loading lora models can be revised as below: if lora_path is not None: model = PeftModel.from_pretrained(model, lora_path , is_trainable=True )

Dec 24 '23 07:12 galphag

ColossalAI ColossalAI copied to clipboard

[FEATURE]: ChatGLM model support

Describe the feature

ColossalAI
ColossalAI copied to clipboard