alpaca-lora icon indicating copy to clipboard operation
alpaca-lora copied to clipboard

setting do_sample=True in GenerationConfig generates errors

Open saurabhkumar opened this issue 1 year ago • 3 comments

When I set do_sample=True in GenerationConfig, I get the error: in lib/python3.8/site-packages/transformers/generation/utils.py", line 3187, in beam_sample next_tokens = torch.multinomial(probs, num_samples=2 * num_beams) RuntimeError: probability tensor contains eitherinf, nan or element < 0

I tried changing values of temperature, top_p, top_k and num_beams but this does not seem to solve it.

Not using do_sample works. But then the output is fixed (even with changing temperature). If I provide only two parameters: temperature=a float value and do_sample=True, in GenerationConfig, this also works. Is there a conflict when all parameters are specified together? (I am using GPU - if that is of any significance).

Could someone suggest what might be wrong?

saurabhkumar avatar Apr 28 '23 15:04 saurabhkumar

I have the same problem too 😭.

MurphyJUAN avatar May 01 '23 21:05 MurphyJUAN

This error may be casued by the beam_scores is increased linearly with length. Refer to beam_sample throws a nan error on long generations.

I think you can set num_beams=1 or set do_sample=False directly.

flow3rdown avatar May 06 '23 06:05 flow3rdown

i've found that it's related to the temperature parameter. If you set temperature to a high value (e.g., the default value of 1.0), the error goes away.

tonyzhao6 avatar May 09 '23 14:05 tonyzhao6

This error may be casued by the beam_scores is increased linearly with length. Refer to beam_sample throws a nan error on long generations.

I think you can set num_beams=1 or set do_sample=False directly.

Setting these parameters returns text, but gibberish unfortunately, even for a high temperature value, see https://github.com/huggingface/transformers/issues/22914#issuecomment-1562034753

Edit: It's a CUDA 11.8 issue with multi GPU and bitsandbytes. Downgrade bitsandbytes to 0.31.8 and downgrade CUDA to 11.6, see referenced issue.

Daryl149 avatar May 25 '23 11:05 Daryl149