Gopal Trital

Results 15 comments of Gopal Trital

@eric-mitchell I finished DPO, but when merging policy.pt, got the following error: ![image](https://github.com/eric-mitchell/direct-preference-optimization/assets/51286679/a0b3a1d0-e89b-4559-982e-2a36ffe90ce8) ![image](https://github.com/eric-mitchell/direct-preference-optimization/assets/51286679/5ae047a1-8d69-465b-ab6c-48ff4544bd1d) ![image](https://github.com/eric-mitchell/direct-preference-optimization/assets/51286679/db82b384-b7ce-4eb9-a0fa-2c21344488d4) Am I doing something wrong in here? **Another issue:** - I have around 80 k...

> Sounds like you worked the lora part out! For loading the new checkpoint, the issue is that you need to load `torch.load(model_archive_name)['state']`, since the archived parameters are in the...

I'm using the same code as get_hh(), as my data is of same structure. When it downloads data from huggingface initially, it shows the same number of training and test...

> Sounds like you worked the lora part out! For loading the new checkpoint, the issue is that you need to load `torch.load(model_archive_name)['state']`, since the archived parameters are in the...

@eric-mitchell I have figured out where it reduces the training example size. In the following section in preference_datasets.py ![image](https://github.com/eric-mitchell/direct-preference-optimization/assets/51286679/88e47adc-d481-4fee-92d2-e3b8c40cbe3d) You can see that I've 349 examples in training dataset: but...