exllama
exllama copied to clipboard
[Bug]: Sampling fails when temperature is 0
This line in generator.py yields infinite logits when temperature is set to 0: https://github.com/turboderp/exllama/blob/c16cf49c3f19e887da31d671a713619c8626484e/generator.py#L106C1-L106C30
Debugger result:
Temperature = 0 is an invalid argument the way temperature is defined here. I don't know if other implementations treat this as a special case or not, but the only sensible interpretation I can think of is that temperature = 0 should be equivalent to top-k = 1.
For the sake of numerical stability, a robust "fix" would have to take into account any numerical instability as the temperature approaches zero. What's the desired behavior here? Should any temperature less than some small threshold result just trigger greedy sampling?
I believe temp=0 should be equal to greedy decoding (as you mentioned same as top_k = 1). I really like your suggestion of selecting a small threshold that takes numerical instability into account😄
But what's the typical behavior in other implementations? If I'm overriding undefined behavior arbitrarily anyway, I'd want to be as unsurprising as possible.
Good question.
It seems that HF is able to decide on the greedy decoding by other means, and does not even look at temperature setting.
VLLM, however, similar to your suggestion, compares temperature to a small constant and replaces it with "1.0" if it is below the constant.
llama-cpp-python however, only checks for equality with 0, in which case it does greedy decoding.