jamestch
jamestch
我计划在20G左右的领域数据(约9B token)上做增量预训练 learning_rate max_seq_length total_steps save_checkpoint_steps …… 等超参数设置有啥推荐吗? [训练中文LLaMA大规模语言模型](https://zhuanlan.zhihu.com/p/612752963)中的如下: deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json \ --pretrained_model_path models/llama-7b.bin \ --dataset_path dataset.pt --spm_model_path $LLaMA_7B_FOLDER/tokenizer.model \ --config_path models/llama/7b_config.json \ --output_model_path models/output_model.bin \ --world_size...
git clone TencentPretrain最新代码,在2*A100 80G GPU上进行DeepSpeed ZeRO-3预训练测试,执行脚本如下(参考:[TencentPretrain 使用 DeepSpeed ZeRO-3 流水线并行训练](https://zhuanlan.zhihu.com/p/621102715)): CUDA_VISIBLE_DEVICES=6,7 deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json \ --pretrained_model_path models/llama-13b.bin \ --dataset_path dataset.pt --spm_model_path /_path_to_llama_/tokenizer.model \ --config_path models/llama/13b_config.json \ --output_model_path models/output_model.llama_13.bin...
How to convert a tencentpretrain llama model to huggingface? can u share the script? Thanks a lot!