Eric Mitchell comments

Results 21 comments of


                                            Eric Mitchell

question about degeneration problem

Thanks for raising this. A few requests to help debug: - What is beta? - Can you share roughly the x axis values (in tokens and number of sequences) for...

DPO did not achieve the expected experimental effect

1. Changing the fsdp_policy_dtype when not using the `FSDPTrainer` will have no effect. The reference wandb run used the command: ``` train.py model=pythia28 datasets=[hh] loss=sft exp_name=pythia28_hh_sft_bf16 gradient_accumulation_steps=2 batch_size=64 n_epochs=1 eval_batch_size=32...

Understanding loss

Did you do an SFT stage on your chosen responses before running DPO? TL;DR is that this behavior is not unexpected. Like you've pointed out, DPO optimizes for the reward...

Is there a plan to support multi-node traning?

Multi-node training is something we're planning to start looking into very soon (in the next week). Unfortunately our cluster is down for maintenance for the next ~5 days, so we...

Is there a plan to support multi-node traning?

Sorry for the slow progress on this- the last few weeks have been much busier than expected. I don't have a clear timeline for multi-node at this point, unfortunately. I...

RuntimeError: Error(s) in loading state_dict on a custom model

To start, it looks like the checkpoint for your weights includes a wrapper `base_model.model.` in front of each parameter name, so PyTorch can't find the parameters it needs. I assume...

RuntimeError: Error(s) in loading state_dict on a custom model

Sounds like you worked the lora part out! For loading the new checkpoint, the issue is that you need to load `torch.load(model_archive_name)['state']`, since the archived parameters are in the `'state'`...

RuntimeError: Error(s) in loading state_dict on a custom model

Are you passing an argument for `n_examples` or `n_epochs`? Can you check in `preference_datasets.py` when you create the dataset, are you sure there are 80k preference pairs? Not sure why...

DPO Loss not converging for Encoder-Decoder Models

@chansurgeplus sorry for the delay here- got behind with ICML last week. One thing to be wary of is that prompts are left-padded, since the current DPO code _only_ uses...

Upload trained Checkpoint?

The QA_model.ckpt is a pre-trained zsRE QA model from de Cao et al (Editing Factual Knowledge in Language Models). Didn’t mean to include it here. I think we can upload...