exllama [Bug]: Sampling fails when temperature is 0

[Bug]: Sampling fails when temperature is 0

Open kogolobo opened this issue 2 years ago • 4 comments

This line in generator.py yields infinite logits when temperature is set to 0: https://github.com/turboderp/exllama/blob/c16cf49c3f19e887da31d671a713619c8626484e/generator.py#L106C1-L106C30

Debugger result:

Aug 06 '23 00:08 kogolobo

Temperature = 0 is an invalid argument the way temperature is defined here. I don't know if other implementations treat this as a special case or not, but the only sensible interpretation I can think of is that temperature = 0 should be equivalent to top-k = 1.

For the sake of numerical stability, a robust "fix" would have to take into account any numerical instability as the temperature approaches zero. What's the desired behavior here? Should any temperature less than some small threshold result just trigger greedy sampling?

Aug 06 '23 01:08 turboderp

I believe temp=0 should be equal to greedy decoding (as you mentioned same as top_k = 1). I really like your suggestion of selecting a small threshold that takes numerical instability into account😄

Aug 06 '23 01:08 kogolobo

But what's the typical behavior in other implementations? If I'm overriding undefined behavior arbitrarily anyway, I'd want to be as unsurprising as possible.

Aug 06 '23 03:08 turboderp

Good question.

It seems that HF is able to decide on the greedy decoding by other means, and does not even look at temperature setting.

VLLM, however, similar to your suggestion, compares temperature to a small constant and replaces it with "1.0" if it is below the constant.

llama-cpp-python however, only checks for equality with 0, in which case it does greedy decoding.

Aug 06 '23 03:08 kogolobo

exllama exllama copied to clipboard

[Bug]: Sampling fails when temperature is 0

exllama
exllama copied to clipboard