litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

RuntimeError: probability tensor contains either inf, nan or element < 0

Open sajjadriaj opened this issue 2 years ago • 4 comments

I was able to do finetuning before but after the recent update I am getting this error:

RuntimeError: probability tensor contains either inf, nan or element < 0

The same error happens when I try to do inference with a previously fine tuned model.

sajjadriaj avatar Jun 23 '23 23:06 sajjadriaj

Did you pull the latest changes? What script did you run, what arguments did you pass? Did you make any changes to the script?

carmocca avatar Jun 24 '23 17:06 carmocca

Sorry for not providing the details. I pulled the latest changes and ran the adapter_v2.py script. I did not change the script just changed the number of epochs. The fine tuning runs but after optimizer.step() the loss becomes nan. I tried reducing the learning rate but it does not help. My dataset size is very small.

Also I am training on 4 v100 so no bfloat16 support. I am running in 16 bit.

sajjadriaj avatar Jun 26 '23 21:06 sajjadriaj

I am running in 16 bit.

--precision 16-mixed or --precision 16-true?

carmocca avatar Jun 30 '23 01:06 carmocca

16-true. The training runs fine and I am able to use the generate script. However I want to experiment multiple prompt and really want to try the chat/interactive mode. Since there is no script to try the fine tuned model in chat mode, i tried to add the adapter to the model in the chat script but it throws error that the probability tensor has inf or nan values. I tried to make the generate script interactive as well but same thing happens :(

sajjadriaj avatar Jun 30 '23 17:06 sajjadriaj

bf16-true will most likely fix it

carmocca avatar Aug 14 '23 12:08 carmocca