LMOps
LMOps copied to clipboard
Is reward_fn equal to log_softmax
I noticed that the scores
in reward_fn
is actually equal to logits_i - logsumexp(logits)
.
I think this expression can be calculated directly by log_softmax
. Why not use log_softmax
?
https://github.com/microsoft/LMOps/blob/5fbf5bcd6e6760fa95aaaf945fb5d9cb033135f6/minillm/minillm/reward.py#L33