chenboheng

Results 2 comments of chenboheng

这不是明显的嘛,开源权重只是预训练模型权重,后续还有指令微调,ppo等很多步骤,怎么可能只用预训练模型就得到好的问答

很明显的错误:using world size: 1 and model-parallel size: 8 你加载权重是1/8的权重,你实际定义模型是完整的维度,自然加载不了:size mismatch for transformer.word_embeddings.weight: copying a param with shape torch.Size([18816, 12288]) from checkpoint, the shape in current model is torch.Size([150528, 12288]).只加载了1/8的权重