tomcat123a

bytedance.com Beijing,China Yilun Zhang <[email protected]>

Results 26 comments of


                                            tomcat123a

怎么做预训练

Pretrain Run the following script to pre-train the GLM-Large model bash scripts/ds_pretrain_nvidia.sh config/ds_block_large.sh The script [scripts/ds_pretrain_nvidia.sh](https://github.com/THUDM/GLM/blob/main/scripts/ds_pretrain_nvidia.sh) launches the training program with DeepSpeed. You should change NUM_WORKERS and NUM_GPUS_PER_WORKER to the...

怎么做预训练

https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py 这个是chatglm6b的预训练。

怎么做预训练

https://github.com/shibing624/MedicalGPT 参考这个项目，预训练，指令微调，rm模型训练，ppo都有现成的

fastapi流式接口

有的，用websocket

无法按照指定字数生成文章？

生成的时候，有参数min_new_tokens,max_new_tokens设置一年就按照“token”个数生成了。

[Help] <请问chatglm-6b有公开完整的训练代码吗？>

https://github.com/shibing624/MedicalGPT 参考这个项目，预训练，指令微调，rm模型训练，ppo都有现成的

[Feature] 请教在领域语料上训练的步骤

pretrain的代码是有的。https://github.com/THUDM/GLM 中参考[pretrain_glm.py](https://github.com/THUDM/GLM/blob/main/pretrain_glm.py)

[Feature] 请教在领域语料上训练的步骤

https://github.com/shibing624/MedicalGPT 参考这个项目，预训练，指令微调，rm模型训练，ppo都有现成的。

ValueError: weight is on the meta device, we need a `value` to put in on 0.

有多张gpu加载的时候用chatglm6b github里面的多卡加载

当我使用DeepSpeed试图多卡训练时，出现错误Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

不要用deepspeed，用huggingface 的accelerate，或者加载的时候，device_map="auto"

‹
1
2
3
›