Yanggan Gu issues

Repositories
Issues
Comments

Results 2 issues of


                                            Yanggan Gu

Is reward_fn equal to log_softmax

I noticed that the `scores` in `reward_fn` is actually equal to `logits_i - logsumexp(logits)`. I think this expression can be calculated directly by `log_softmax`. Why not use `log_softmax`? https://github.com/microsoft/LMOps/blob/5fbf5bcd6e6760fa95aaaf945fb5d9cb033135f6/minillm/minillm/reward.py#L33

Question about apply_chat_template in examples

When I looked at the examples I found that the example script for DPO uses `apply_chat_template` for `chosen` and `rejected` but not for `prompt`. https://github.com/huggingface/trl/blob/d1ed730ab8281b1b0c78d7d61bc0f6603a9ce958/examples/scripts/dpo.py#L150-L152 And it seems that `chosen`...