llama-recipes llama 2 distributed training on AWS Sagemaker

llama 2 distributed training on AWS Sagemaker

Open premanand09 opened this issue 1 year ago • 4 comments

Hi, I am going to do distributed training of llama on aws sagemaker as managed training across multiple devices/nodes. Sagemaker provides data parallel and model parallel distributed training in sagemaker. SInce sagemaker already takes care of distributed training, do i need to keep current FSDP implementation of llama fine tuning script? or should i remove it?

Aug 20 '23 12:08 premanand09

llama-recipes llama-recipes copied to clipboard

llama 2 distributed training on AWS Sagemaker

llama-recipes
llama-recipes copied to clipboard