starcoder zero3 DPO starcoder OOM

zero3 DPO starcoder OOM

Open oo0-0-0oo opened this issue 1 year ago • 0 comments

trafficstars

when I use DPO to train 7B starcoder, OOM happened, i used 16 A100 ,zero3 used TRL and transformers, When the code runs to AutoModelForCausalLM.from_pretrained , OOM happened. but qwencoder don't have this trouble. Are there any special settings in the model's structure that are not suitable for DPO (Direct Policy Optimization)? code is `parser = HfArgumentParser(DPOTrainingArguments) args = parser.parse_args_into_dataclasses()[0] down_file(args.model_path, args.pretrained_model)

model = AutoModelForCausalLM.from_pretrained(args.model_path) model_ref = AutoModelForCausalLM.from_pretrained(args.model_path) tokenizer = AutoTokenizer.from_pretrained(args.model_path)

dpo_trainer = MyDPOTrainer( model, model_ref, args=args, .... )

dpo_trainer.train()

Jun 26 '24 03:06 oo0-0-0oo

starcoder starcoder copied to clipboard

zero3 DPO starcoder OOM

starcoder
starcoder copied to clipboard