fispresent comments

Repositories
Issues
Comments

Results 3 comments of


                                            fispresent

A serious bug in DPO implementation.

> you mean the mean also take IGNORE_ID part into account, right? yes we will update it later, chosen_logps should be the mean value only of chosen tokens Yes.

A serious bug in DPO implementation.

> Is this right?:chosen_lm_mask = (chosen_lm_target != IGNORE_ID) rejected_lm_mask = (rejected_lm_target != IGNORE_ID) > > chosen_logps = torch.gather( chosen_logits.log_softmax(dim=-1), dim=2, index=chosen_lm_target.masked_fill(~chosen_lm_mask, 0).unsqueeze(-1) ).squeeze(-1) > > rejected_logps = torch.gather( rejected_logits.log_softmax(dim=-1), dim=2,...

A serious bug in DPO implementation.

> Do you find others error in dpo finetune code？ No. This is an excellent job.