llama-recipes icon indicating copy to clipboard operation
llama-recipes copied to clipboard

llama 2 distributed training on AWS Sagemaker

Open premanand09 opened this issue 1 year ago • 4 comments

Hi, I am going to do distributed training of llama on aws sagemaker as managed training across multiple devices/nodes. Sagemaker provides data parallel and model parallel distributed training in sagemaker. SInce sagemaker already takes care of distributed training, do i need to keep current FSDP implementation of llama fine tuning script? or should i remove it?

premanand09 avatar Aug 20 '23 12:08 premanand09