jamestch issues

Results 13 issues of


                                            jamestch

增量预训练超参数设置

我计划在20G左右的领域数据（约9B token）上做增量预训练 learning_rate max_seq_length total_steps save_checkpoint_steps …… 等超参数设置有啥推荐吗？ [训练中文LLaMA大规模语言模型](https://zhuanlan.zhihu.com/p/612752963)中的如下： deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json \ --pretrained_model_path models/llama-7b.bin \ --dataset_path dataset.pt --spm_model_path $LLaMA_7B_FOLDER/tokenizer.model \ --config_path models/llama/7b_config.json \ --output_model_path models/output_model.bin \ --world_size...

DeepSpeed ZeRO-3预训练

git clone TencentPretrain最新代码，在2*A100 80G GPU上进行DeepSpeed ZeRO-3预训练测试，执行脚本如下（参考：[TencentPretrain 使用 DeepSpeed ZeRO-3 流水线并行训练](https://zhuanlan.zhihu.com/p/621102715)）： CUDA_VISIBLE_DEVICES=6,7 deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json \ --pretrained_model_path models/llama-13b.bin \ --dataset_path dataset.pt --spm_model_path /_path_to_llama_/tokenizer.model \ --config_path models/llama/13b_config.json \ --output_model_path models/output_model.llama_13.bin...

convert a tencentpretrain llama model to huggingface?

How to convert a tencentpretrain llama model to huggingface? can u share the script? Thanks a lot!