FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

Possible Bug in Context Likelihood

Open bharatv007 opened this issue 2 years ago • 3 comments

When calculating the log likelihood of token at position i, we should consider the logits at step i-1 and also log likelihood of starting token is undefined (can be set to zero). https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/kernels/logprob_kernels.cu#L71 The way we fixed it.

  1. Shift the position of logits by subtracted by vocab_size_padded (verified for batch_first case) logits += step_offset + batch_offset - vocab_size_padded;
  2. add step>0 in the loop (to avoid computing log probs for 0th token)

bharatv007 avatar Aug 05 '22 14:08 bharatv007

Hi bharatv007

I guess you're thinking the first token (at step 0th) should be ignored in the computation of the log probability of contexts and the token ids should be shifted by 1 according to the descprtion 1 (Shift the ...). However, FT doesn't have any assumption of input tokens and that's why FT hasn't skipped the first token in computation.

Please correct me if I misunderstood the question and can you explain some details?

jaedeok-nvidia avatar Aug 08 '22 03:08 jaedeok-nvidia

If the input is N tokens, the log likelihood will be a N-1 tokens, explained below. log likelihood of token is defined as the log of probability of prediciting that token given all the previous tokens. So for first or 0th token, it is undefined as there are no previous tokens. Since we are getting the probability of predicting a token at time t, we need to get the logits of t-1 (the current kernel is computing the probabilities on the incorrect logits step). Let me know if this is clear enough.

bharatv007 avatar Aug 09 '22 18:08 bharatv007

If i understand correctly, the kernel should work if the provided ids are output tokens instead of input tokens at step t. However I see it's called with input tokens here, I think we could either change the log_probs_kernel as bharatv007 suggested, or shift the input_ids by one element (because input of step t+1 is the same as output of step t here).

dongluw avatar Aug 09 '22 23:08 dongluw

Thank you for the comment and discussion. As you say, this is a bug, and we have fixed it in latest release.

byshiue avatar Aug 16 '22 03:08 byshiue

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

byshiue avatar Sep 08 '22 07:09 byshiue