ChineseBert
ChineseBert copied to clipboard
关于预训练
请问是否有开源预训练的代码呢
同问
Could you release the pretraining code? I may have difficulty in masking pinyin ids and original ids. With best regards, Yunpeng Tai
We followed Hugginface‘s pre-training scripts, you can replace the origin BertModel with our GlyceBertModel easily: here is the link: https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling
We followed Hugginface‘s pre-training scripts, you can replace the origin BertModel with our GlyceBertModel easily: here is the link: https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling
尝试了好久,并不easily呀 求开源pre-train的代码
请问是否有开源预训练的代码呢
请问,您是否有跑通Pre-train的过程呢? 我在尝试利用自己的数据去做Pre-train, 踩坑好久
我们的模型就是预训练出来的,你跑上面的language model pretrain有什么问题呢,可以发下错误截图。
我们的模型就是预训练出来的,你跑上面的language model pretrain有什么问题呢,可以发下错误截图。
请问在run_mlm.py的预训练过程中,是要将
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)
和
model = AutoModelForMaskedLM.from_pretrained( model_args.model_name_or_path, from_tf=bool(".ckpt" in model_args.model_name_or_path), config=config, cache_dir=model_args.cache_dir, revision=model_args.model_revision, use_auth_token=True if model_args.use_auth_token else None, )
分别替换为
tokenizer = BertMaskDataset(vocab_file, config_path)
和
model = GlyceBertForMaskedLM.from_pretrained(model_args.model_name_or_path)
吗?谢谢!