fispresent
fispresent
> you mean the mean also take IGNORE_ID part into account, right? yes we will update it later, chosen_logps should be the mean value only of chosen tokens Yes.
> Is this right?:chosen_lm_mask = (chosen_lm_target != IGNORE_ID) rejected_lm_mask = (rejected_lm_target != IGNORE_ID) > > chosen_logps = torch.gather( chosen_logits.log_softmax(dim=-1), dim=2, index=chosen_lm_target.masked_fill(~chosen_lm_mask, 0).unsqueeze(-1) ).squeeze(-1) > > rejected_logps = torch.gather( rejected_logits.log_softmax(dim=-1), dim=2,...
> Do you find others error in dpo finetune code? No. This is an excellent job.