alphanlp
alphanlp
it's very interesting, when i user softmax as proposed in paper, the loss can not down
> why qlora‘s loss is slower, i find the same question.
does who solve the promblem?
me too? do you have solve the problem?
有共享的代理吗?没有海外服务器啊
bert.embeddings.word_embeddings.weight: found shape torch.Size([21128, 768]) in the checkpoint and torch.Size([30522, 768]) in the model instantiated
same error using LLaMA as an actor when zero stage = 3