Results 13 comments of alphanlp

it's very interesting, when i user softmax as proposed in paper, the loss can not down

> why qlora‘s loss is slower, i find the same question.

does who solve the promblem?

有共享的代理吗?没有海外服务器啊

bert.embeddings.word_embeddings.weight: found shape torch.Size([21128, 768]) in the checkpoint and torch.Size([30522, 768]) in the model instantiated

same error using LLaMA as an actor when zero stage = 3