Linly
Linly copied to clipboard
DeepSpeed ZeRO-3预训练
git clone TencentPretrain最新代码,在2*A100 80G GPU上进行DeepSpeed ZeRO-3预训练测试,执行脚本如下(参考:TencentPretrain 使用 DeepSpeed ZeRO-3 流水线并行训练):
CUDA_VISIBLE_DEVICES=6,7 deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json
--pretrained_model_path models/llama-13b.bin
--dataset_path dataset.pt --spm_model_path /path_to_llama/tokenizer.model
--config_path models/llama/13b_config.json
--output_model_path models/output_model.llama_13.bin
--world_size 2 --data_processor lm --batch_size 2 --enable_zero3
不开启ZeRO-3正常,开启后报如下错误:
我修改了ZeRO-3的配置文件为deepspeed_zero3_config.json问题似乎解决了
CUDA_VISIBLE_DEVICES=6,7 deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_zero3_config.json
--pretrained_model_path models/llama-13b.bin
--dataset_path dataset.pt --spm_model_path /path_to_llama/tokenizer.model
--config_path models/llama/13b_config.json
--output_model_path models/output_model.llama_13.bin
--world_size 2 --data_processor lm --batch_size 2 --enable_zero3
咨询下,这个一般需要多少内存,我这边100多G,一直报内存不足。 RuntimeError: [enforce fail at CPUAllocator.cpp:71] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 4052879360 bytes. Error code 12 (Cannot allocate memory)
我这边占用了150G内存,不知道为啥CPU占用还挺高的