FasterTransformer
FasterTransformer copied to clipboard
Possible Bug in Context Likelihood
When calculating the log likelihood of token at position i, we should consider the logits at step i-1 and also log likelihood of starting token is undefined (can be set to zero). https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/kernels/logprob_kernels.cu#L71 The way we fixed it.
- Shift the position of logits by subtracted by vocab_size_padded (verified for batch_first case)
logits += step_offset + batch_offset - vocab_size_padded;
- add step>0 in the loop (to avoid computing log probs for 0th token)
Hi bharatv007
I guess you're thinking the first token (at step 0th) should be ignored in the computation of the log probability of contexts and the token ids should be shifted by 1 according to the descprtion 1 (Shift the ...). However, FT doesn't have any assumption of input tokens and that's why FT hasn't skipped the first token in computation.
Please correct me if I misunderstood the question and can you explain some details?
If the input is N
tokens, the log likelihood will be a N-1
tokens, explained below.
log likelihood of token is defined as the log of probability of prediciting that token given all the previous tokens. So for first or 0th token, it is undefined as there are no previous tokens. Since we are getting the probability of predicting a token at time t
, we need to get the logits of t-1
(the current kernel is computing the probabilities on the incorrect logits step). Let me know if this is clear enough.
If i understand correctly, the kernel should work if the provided ids
are output tokens instead of input tokens at step t. However I see it's called with input tokens here, I think we could either change the log_probs_kernel as bharatv007 suggested, or shift the input_ids by one element (because input of step t+1 is the same as output of step t here).
Thank you for the comment and discussion. As you say, this is a bug, and we have fixed it in latest release.
Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.