ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[Feature] <title>请问如果我想像GLM那样直接对一篇文章做训练,而不是prompt的方式,应该怎么做?

Open xiongxiaochu opened this issue 1 year ago • 11 comments

Is your feature request related to a problem? Please describe.

目前能看到各种微调的方式比如p-tuning和lora,请问如果我想像pretrain那样,直接mask文章的片段去训练模型,需要怎么做呢?

Solutions

能否直接给出训练的demo?包括需要怎么准备训练数据,如何mask之类,十分感谢

Additional context

No response

xiongxiaochu avatar May 24 '23 09:05 xiongxiaochu

mark.我也想知道这个可不可能。如果从头走GLM到ChatGLM的RLHF过程,成本也太高了

runzhi214 avatar May 25 '23 05:05 runzhi214

同问

aaronysl avatar May 25 '23 06:05 aaronysl

同问

happy-zhangbo avatar May 25 '23 07:05 happy-zhangbo

mark

Noyce765103 avatar May 25 '23 09:05 Noyce765103

cy

SolarKnight1 avatar May 26 '23 01:05 SolarKnight1

mark

xxentropy avatar May 26 '23 08:05 xxentropy

mark

hongyix avatar May 26 '23 09:05 hongyix

答案在这个代码里。 pretrain_glm.py

tomcat123a avatar May 29 '23 10:05 tomcat123a

参考 https://github.com/THUDM/GLM/tree/main Pretrain Run the following script to pre-train the GLM-Large model

bash scripts/ds_pretrain_nvidia.sh config/ds_block_large.sh The script scripts/ds_pretrain_nvidia.sh launches the training program with DeepSpeed. You should change NUM_WORKERS and NUM_GPUS_PER_WORKER to the number of workers and the number of gpus per worker. Also change HOST_FILE_PATH to the path to an OpenMPI-style hostfile. More details about DeepSpeed launcher can be found here.

The file config/ds_block_large.sh defines the hyperparameters for pretraining. Most of the arguments are fairly self-explanatory. Specifically, --train-data can be multiple keywords defined in NAMED_CORPORA in data_utils/corpora.py. The hyperparameters of the optimizer are defined in the corresponding json file under config. The semantics of the json file can be found here.

tomcat123a avatar May 29 '23 10:05 tomcat123a

参考 https://github.com/THUDM/GLM/tree/main Pretrain Run the following script to pre-train the GLM-Large model

bash scripts/ds_pretrain_nvidia.sh config/ds_block_large.sh The script scripts/ds_pretrain_nvidia.sh launches the training program with DeepSpeed. You should change NUM_WORKERS and NUM_GPUS_PER_WORKER to the number of workers and the number of gpus per worker. Also change HOST_FILE_PATH to the path to an OpenMPI-style hostfile. More details about DeepSpeed launcher can be found here.

The file config/ds_block_large.sh defines the hyperparameters for pretraining. Most of the arguments are fairly self-explanatory. Specifically, --train-data can be multiple keywords defined in NAMED_CORPORA in data_utils/corpora.py. The hyperparameters of the optimizer are defined in the corresponding json file under config. The semantics of the json file can be found here.

好的,我想请教一下,如果按照这种方式pretrain了,会对chatglm指令微调和强化学习的效果产生影响吗?

xiongxiaochu avatar May 30 '23 02:05 xiongxiaochu

mark

MikeHollyWong avatar Jun 02 '23 02:06 MikeHollyWong

那是否我需要一个GLM6B作为base模型。

lonelydancer avatar Jun 12 '23 06:06 lonelydancer

mark

qicmsg avatar Jul 07 '23 17:07 qicmsg

https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py 这个是chatglm6b的预训练。

tomcat123a avatar Jul 20 '23 02:07 tomcat123a

https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py 这个是chatglm6b的预训练。

好的,我们也用这个在训了,感谢~

xiongxiaochu avatar Jul 20 '23 09:07 xiongxiaochu

mark

DanteLuo avatar Aug 15 '23 06:08 DanteLuo