CoLLM icon indicating copy to clipboard operation
CoLLM copied to clipboard

Input contains NaN.

Open Alack1 opened this issue 2 years ago • 7 comments

Following your README step by step, using the dataset directly from your preprocessed ml-1m file, why does it show the error "Input contains NaN"?

Alack1 avatar Jan 21 '24 13:01 Alack1

Which model? What is the learning rate?

zyang1580 avatar Feb 21 '24 07:02 zyang1580

If the learning rate is too high, you may need to reduce it.

zyang1580 avatar Feb 21 '24 07:02 zyang1580

I meet the same problem when I use Llama2-based Vicuna model (vicuna-7b-v1.5), and the setting of learning rate is as follows: lr_sched: "linear_warmup_cosine_lr" init_lr: 1e-4 min_lr: 8e-5 warmup_lr: 1e-5

However, when I use Llama1-based Vicunas (v1.1 and v1.3) it runs successfully. Are there any settings in the code that solely works for Llama1 while incompatible with Llama2 ?

Which model? What is the learning rate?

XiyuChangSJTU avatar May 11 '24 08:05 XiyuChangSJTU

I haven't experimented with Llamma 2 yet, so I'm unsure of the potential reasons. We might need to adjust the codes or settings from Llamma 1 to make them compatible with Llamma 2.

Have you successfully resolved the issue?

zyang1580 avatar Jun 09 '24 07:06 zyang1580

I haven't experimented with Llamma 2 yet, so I'm unsure of the potential reasons. We might need to adjust the codes or settings from Llamma 1 to make them compatible with Llamma 2.

Have you successfully resolved the issue?

Thanks for your reply. I have resolved this problem by changing the padding side of tokenizers to "right"( as specified in Vicuna's config files) and adjusting the other corresponding codes.

I think the effect of padding side is a common problem for tuning LLMs as I met the similar NaN problem when I tried tuning other LLMs (which should use right padding) with left paddings.

XiyuChangSJTU avatar Jun 09 '24 08:06 XiyuChangSJTU

@XiyuChangSJTU Hi, I have also meet the same problem with vicuna-v1.5. Could you please share how to resolve it exactly? Just by changing the padding side of tokenizers to "right"?

scvready123 avatar Sep 15 '24 11:09 scvready123

I haven't experimented with Llamma 2 yet, so I'm unsure of the potential reasons. We might need to adjust the codes or settings from Llamma 1 to make them compatible with Llamma 2. Have you successfully resolved the issue?

Thanks for your reply. I have resolved this problem by changing the padding side of tokenizers to "right"( as specified in Vicuna's config files) and adjusting the other corresponding codes.

I think the effect of padding side is a common problem for tuning LLMs as I met the similar NaN problem when I tried tuning other LLMs (which should use right padding) with left paddings.

Hi, I changed the padding direction. But I found the training speed is very low.

scvready123 avatar Sep 27 '24 10:09 scvready123