alignment-handbook
alignment-handbook copied to clipboard
Does QLora DPO Training support reference model?
Hello! Thanks for your awesome work! I meet an issue when I run dpo with qlora. I notice there is a setting:
if model_args.use_peft is True:
ref_model = None
ref_model_kwargs = None
I also notice that the use_peft
is set to true only in config_qlora.yaml. This means if we use qlora to do dpo training, we do not use reference model at all.
I wonder if this code support qlora training with reference model? Thanks!