ChatGLM-6B [Feature] <title>请问如果我想像GLM那样直接对一篇文章做训练，而不是prompt的方式，应该怎么做？

[Feature] <title>请问如果我想像GLM那样直接对一篇文章做训练，而不是prompt的方式，应该怎么做？

Open xiongxiaochu opened this issue 1 year ago • 11 comments

Is your feature request related to a problem? Please describe.

目前能看到各种微调的方式比如p-tuning和lora，请问如果我想像pretrain那样，直接mask文章的片段去训练模型，需要怎么做呢？

Solutions

能否直接给出训练的demo？包括需要怎么准备训练数据，如何mask之类，十分感谢

Additional context

No response

May 24 '23 09:05 xiongxiaochu

mark.我也想知道这个可不可能。如果从头走GLM到ChatGLM的RLHF过程，成本也太高了

May 25 '23 05:05 runzhi214

同问

May 25 '23 06:05 aaronysl

同问

May 25 '23 07:05 happy-zhangbo

mark

May 25 '23 09:05 Noyce765103

May 26 '23 01:05 SolarKnight1

mark

May 26 '23 08:05 xxentropy

mark

May 26 '23 09:05 hongyix

答案在这个代码里。 pretrain_glm.py

May 29 '23 10:05 tomcat123a

参考 https://github.com/THUDM/GLM/tree/main Pretrain Run the following script to pre-train the GLM-Large model

bash scripts/ds_pretrain_nvidia.sh config/ds_block_large.sh The script scripts/ds_pretrain_nvidia.sh launches the training program with DeepSpeed. You should change NUM_WORKERS and NUM_GPUS_PER_WORKER to the number of workers and the number of gpus per worker. Also change HOST_FILE_PATH to the path to an OpenMPI-style hostfile. More details about DeepSpeed launcher can be found here.

The file config/ds_block_large.sh defines the hyperparameters for pretraining. Most of the arguments are fairly self-explanatory. Specifically, --train-data can be multiple keywords defined in NAMED_CORPORA in data_utils/corpora.py. The hyperparameters of the optimizer are defined in the corresponding json file under config. The semantics of the json file can be found here.

May 29 '23 10:05 tomcat123a

参考 https://github.com/THUDM/GLM/tree/main Pretrain Run the following script to pre-train the GLM-Large model

bash scripts/ds_pretrain_nvidia.sh config/ds_block_large.sh The script scripts/ds_pretrain_nvidia.sh launches the training program with DeepSpeed. You should change NUM_WORKERS and NUM_GPUS_PER_WORKER to the number of workers and the number of gpus per worker. Also change HOST_FILE_PATH to the path to an OpenMPI-style hostfile. More details about DeepSpeed launcher can be found here.

The file config/ds_block_large.sh defines the hyperparameters for pretraining. Most of the arguments are fairly self-explanatory. Specifically, --train-data can be multiple keywords defined in NAMED_CORPORA in data_utils/corpora.py. The hyperparameters of the optimizer are defined in the corresponding json file under config. The semantics of the json file can be found here.

好的，我想请教一下，如果按照这种方式pretrain了，会对chatglm指令微调和强化学习的效果产生影响吗？

May 30 '23 02:05 xiongxiaochu

mark

Jun 02 '23 02:06 MikeHollyWong

那是否我需要一个GLM6B作为base模型。

Jun 12 '23 06:06 lonelydancer

mark

Jul 07 '23 17:07 qicmsg

https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py 这个是chatglm6b的预训练。

Jul 20 '23 02:07 tomcat123a

https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py 这个是chatglm6b的预训练。

好的，我们也用这个在训了，感谢~

Jul 20 '23 09:07 xiongxiaochu

mark

Aug 15 '23 06:08 DanteLuo

ChatGLM-6B ChatGLM-6B copied to clipboard

[Feature] <title>请问如果我想像GLM那样直接对一篇文章做训练，而不是prompt的方式，应该怎么做？

Is your feature request related to a problem? Please describe.

Solutions

Additional context

ChatGLM-6B
ChatGLM-6B copied to clipboard