mx8435 comments

Results 5 comments of


mx8435

_BertWordModel疑问

> vocab减小只是其中一个收益，另外的优点是这样在调用的时候可以更方便地预处理。如果只是单纯用原生的BertModel的话跟transformers并没有什么区别 vocab减小对训练效果应该没有加成吧。我看了下代码，这个类优势是可以对训练数据中的unk字符进行训练。

_BertWordModel疑问

我对比过_BertWordModel和原始bert，发现效果会更好(f1 +0.3%)。看了下代码，主要是因为有对不在bert词表中但在训练数据vocab中的token进行训练，这类词如中文引号。不知道这个是不是_BertWordModel的优势？ @yhcc @xuyige

预训练第一阶段，需要冻结原版LLaMA词表的embedding吗

> 请问训练整个词表是否会影响原本的embedding？冻结原版LLaMA的embedding会不会效果更好呢？请问您有没有做过这个实验，或者怎么看待这个问题呢？感谢您的解答 @ymcui 请教下为啥不建议第一阶段训练？第一阶段训练的代码在哪儿可以找到?

Reproduce result of Boolq on LLaMA-7B

@bmosaicml Here is `yamls/hf_eval.yaml` used, and I run `WORLD_SIZE=8 composer eval.py yamls/hf_eval.yaml` to evaluate. ```yaml max_seq_len: 2048 seed: 1 model_name_or_path: LLaMA-7B_hf/ # Tokenizer tokenizer: name: ${model_name_or_path} kwargs: model_max_length: ${max_seq_len} model:...

Reproduce result of Boolq on LLaMA-7B

@bmosaicml The datasets reproduce __success__ are list as following: metrics/piqa/5-shot/InContextLearningMultipleChoiceAccuracy: 0.800000011920929 metrics/lambada_openai/0-shot/InContextLearningLMAccuracy: 0.7379844784736633 winogrande/0-shot/InContextLearningMultipleChoiceAccuracy: 0.7005 copa/0-shot/InContextLearningMultipleChoiceAccuracy: 0.7788 The datasets reproduce __fail__ are list as following: arc_easy/0-shot/InContextLearningMultipleChoiceAccuracy: 0.4242 arc_challenge/0-shot/InContextLearningMultipleChoiceAccuracy: 0.3579931855201721 copa/0-shot/InContextLearningMultipleChoiceAccuracy:...