mx8435

Results 5 issues of mx8435

hi,请问_BertWordModel这个类为什么需要利用训练、测试数据的vocab,来重写调整BERT模型的embedding?这种相对于直接用原始的bert有什么优势吗?我发现只是训练字表减小了,这会有什么收益吗?

Hi The __zeroshot__ performance on BoolQ in LLaMA paper is 76.5. While the llm-foundry only 62.16 (zero-shot) when following `tasks.yaml`. The result in blog is a few-shot ? How about...

Hi,Is the __GLM model__ in chatglm-6b pretrained on this repo by just modifying some tricks (such as add rope embedding, use new tokenizer), or use the GLM-130b based repo which...

base模型是纯预训练模型吗,有用监督数据训练过吗?

Hi, great job. Did you have a ablation study about the performance between MLA and MHA in __dense__ model ? Thanks.