RuntimeError: probability tensor contains either inf, nan or element < 0
I was able to do finetuning before but after the recent update I am getting this error:
RuntimeError: probability tensor contains either inf, nan or element < 0
The same error happens when I try to do inference with a previously fine tuned model.
Did you pull the latest changes? What script did you run, what arguments did you pass? Did you make any changes to the script?
Sorry for not providing the details. I pulled the latest changes and ran the adapter_v2.py script. I did not change the script just changed the number of epochs. The fine tuning runs but after optimizer.step() the loss becomes nan. I tried reducing the learning rate but it does not help. My dataset size is very small.
Also I am training on 4 v100 so no bfloat16 support. I am running in 16 bit.
I am running in 16 bit.
--precision 16-mixed or --precision 16-true?
16-true. The training runs fine and I am able to use the generate script. However I want to experiment multiple prompt and really want to try the chat/interactive mode. Since there is no script to try the fine tuned model in chat mode, i tried to add the adapter to the model in the chat script but it throws error that the probability tensor has inf or nan values. I tried to make the generate script interactive as well but same thing happens :(
bf16-true will most likely fix it