Eric Mitchell

Results 21 comments of Eric Mitchell

Thanks for raising this. A few requests to help debug: - What is beta? - Can you share roughly the x axis values (in tokens and number of sequences) for...

1. Changing the fsdp_policy_dtype when not using the `FSDPTrainer` will have no effect. The reference wandb run used the command: ``` train.py model=pythia28 datasets=[hh] loss=sft exp_name=pythia28_hh_sft_bf16 gradient_accumulation_steps=2 batch_size=64 n_epochs=1 eval_batch_size=32...

Did you do an SFT stage on your chosen responses before running DPO? TL;DR is that this behavior is not unexpected. Like you've pointed out, DPO optimizes for the reward...

Multi-node training is something we're planning to start looking into very soon (in the next week). Unfortunately our cluster is down for maintenance for the next ~5 days, so we...

Sorry for the slow progress on this- the last few weeks have been much busier than expected. I don't have a clear timeline for multi-node at this point, unfortunately. I...

To start, it looks like the checkpoint for your weights includes a wrapper `base_model.model.` in front of each parameter name, so PyTorch can't find the parameters it needs. I assume...

Sounds like you worked the lora part out! For loading the new checkpoint, the issue is that you need to load `torch.load(model_archive_name)['state']`, since the archived parameters are in the `'state'`...

Are you passing an argument for `n_examples` or `n_epochs`? Can you check in `preference_datasets.py` when you create the dataset, are you sure there are 80k preference pairs? Not sure why...

@chansurgeplus sorry for the delay here- got behind with ICML last week. One thing to be wary of is that prompts are left-padded, since the current DPO code _only_ uses...

The QA_model.ckpt is a pre-trained zsRE QA model from de Cao et al (Editing Factual Knowledge in Language Models). Didn’t mean to include it here. I think we can upload...