LMOps icon indicating copy to clipboard operation
LMOps copied to clipboard

Is reward_fn equal to log_softmax

Open EganGu opened this issue 9 months ago • 0 comments

I noticed that the scores in reward_fn is actually equal to logits_i - logsumexp(logits). I think this expression can be calculated directly by log_softmax. Why not use log_softmax?

https://github.com/microsoft/LMOps/blob/5fbf5bcd6e6760fa95aaaf945fb5d9cb033135f6/minillm/minillm/reward.py#L33

EganGu avatar May 14 '24 07:05 EganGu