ChatGLM-6B
ChatGLM-6B copied to clipboard
[Feature] <title>请问如果我想像GLM那样直接对一篇文章做训练,而不是prompt的方式,应该怎么做?
Is your feature request related to a problem? Please describe.
目前能看到各种微调的方式比如p-tuning和lora,请问如果我想像pretrain那样,直接mask文章的片段去训练模型,需要怎么做呢?
Solutions
能否直接给出训练的demo?包括需要怎么准备训练数据,如何mask之类,十分感谢
Additional context
No response
mark.我也想知道这个可不可能。如果从头走GLM到ChatGLM的RLHF过程,成本也太高了
同问
同问
mark
cy
mark
mark
答案在这个代码里。 pretrain_glm.py
参考 https://github.com/THUDM/GLM/tree/main Pretrain Run the following script to pre-train the GLM-Large model
bash scripts/ds_pretrain_nvidia.sh config/ds_block_large.sh The script scripts/ds_pretrain_nvidia.sh launches the training program with DeepSpeed. You should change NUM_WORKERS and NUM_GPUS_PER_WORKER to the number of workers and the number of gpus per worker. Also change HOST_FILE_PATH to the path to an OpenMPI-style hostfile. More details about DeepSpeed launcher can be found here.
The file config/ds_block_large.sh defines the hyperparameters for pretraining. Most of the arguments are fairly self-explanatory. Specifically, --train-data can be multiple keywords defined in NAMED_CORPORA in data_utils/corpora.py. The hyperparameters of the optimizer are defined in the corresponding json file under config. The semantics of the json file can be found here.
参考 https://github.com/THUDM/GLM/tree/main Pretrain Run the following script to pre-train the GLM-Large model
bash scripts/ds_pretrain_nvidia.sh config/ds_block_large.sh The script scripts/ds_pretrain_nvidia.sh launches the training program with DeepSpeed. You should change NUM_WORKERS and NUM_GPUS_PER_WORKER to the number of workers and the number of gpus per worker. Also change HOST_FILE_PATH to the path to an OpenMPI-style hostfile. More details about DeepSpeed launcher can be found here.
The file config/ds_block_large.sh defines the hyperparameters for pretraining. Most of the arguments are fairly self-explanatory. Specifically, --train-data can be multiple keywords defined in NAMED_CORPORA in data_utils/corpora.py. The hyperparameters of the optimizer are defined in the corresponding json file under config. The semantics of the json file can be found here.
好的,我想请教一下,如果按照这种方式pretrain了,会对chatglm指令微调和强化学习的效果产生影响吗?
mark
那是否我需要一个GLM6B作为base模型。
mark
https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py 这个是chatglm6b的预训练。
https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py 这个是chatglm6b的预训练。
好的,我们也用这个在训了,感谢~
mark