alignment-handbook icon indicating copy to clipboard operation
alignment-handbook copied to clipboard

Does QLora DPO Training support reference model?

Open Harry-mic opened this issue 1 year ago • 0 comments

Hello! Thanks for your awesome work! I meet an issue when I run dpo with qlora. I notice there is a setting:

 if model_args.use_peft is True:
        ref_model = None
        ref_model_kwargs = None

I also notice that the use_peft is set to true only in config_qlora.yaml. This means if we use qlora to do dpo training, we do not use reference model at all.
I wonder if this code support qlora training with reference model? Thanks!

Harry-mic avatar Jan 15 '24 09:01 Harry-mic