mx8435 issues

Repositories
Issues
Comments

Results 5 issues of


mx8435

_BertWordModel疑问

hi，请问_BertWordModel这个类为什么需要利用训练、测试数据的vocab，来重写调整BERT模型的embedding?这种相对于直接用原始的bert有什么优势吗？我发现只是训练字表减小了，这会有什么收益吗？

Reproduce result of Boolq on LLaMA-7B

Hi The __zeroshot__ performance on BoolQ in LLaMA paper is 76.5. While the llm-foundry only 62.16 (zero-shot) when following `tasks.yaml`. The result in blog is a few-shot ? How about...

chatglm-6b

Hi，Is the __GLM model__ in chatglm-6b pretrained on this repo by just modifying some tricks (such as add rope embedding, use new tokenizer), or use the GLM-130b based repo which...

base模型

base模型是纯预训练模型吗，有用监督数据训练过吗？

Comparison Between MLA and MHA in dense model

Hi, great job. Did you have a ablation study about the performance between MLA and MHA in __dense__ model ? Thanks.