OpenChatKit icon indicating copy to clipboard operation
OpenChatKit copied to clipboard

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Open believeland23 opened this issue 1 year ago • 2 comments

my environment is GPU: V100-32G torch: 1.13.1+cu116 python: 3.7.13

I load the model by using Int8: tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B") model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B", device_map="auto", load_in_8bit=True)

And when run the model.generate the error will occur: RuntimeError: probability tensor contains either inf, nan or element < 0 at "next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)" image

I printed the values and found that from "next_token_scores = logits_processor(input_ids, next_token_logits)", all the elements of the tensor is "nan".

image image

believeland23 avatar Apr 04 '23 08:04 believeland23

I have the same problem. inputs are all valid (not nan, not inf, and >0), but probs seems to be all nan:

(Pdb) probs[0,-100:]
tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan], device='cuda:0', dtype=torch.float16)

DavidFarago avatar May 24 '23 09:05 DavidFarago

This problem seems to be caused by incorrect configuration of the Converting weights to Hugging Face format step

--n-stages can be found from checkpoint folder, in my environment --n-stages 2 image

--n-layer-per-stage must refer to the model configuration --n-layer-per-stage = model layers / --n-stages, in my environment --n-layer-per-stage 16 https://huggingface.co/EleutherAI/pythia-6.9b-deduped image

ChengYen-Tang avatar Aug 07 '23 05:08 ChengYen-Tang