ChineseBert 关于预训练

请问是否有开源预训练的代码呢

Oct 14 '21 06:10 laikaiting

同问

Oct 20 '21 07:10 Dioxideme

Could you release the pretraining code? I may have difficulty in masking pinyin ids and original ids. With best regards, Yunpeng Tai

Oct 20 '21 13:10 sherlcok314159

We followed Hugginface‘s pre-training scripts, you can replace the origin BertModel with our GlyceBertModel easily: here is the link: https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling

Oct 21 '21 09:10 zijunsun

We followed Hugginface‘s pre-training scripts, you can replace the origin BertModel with our GlyceBertModel easily: here is the link: https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling

尝试了好久，并不easily呀求开源pre-train的代码

Nov 30 '21 12:11 jw8023wh

请问是否有开源预训练的代码呢

请问，您是否有跑通Pre-train的过程呢？我在尝试利用自己的数据去做Pre-train, 踩坑好久

Dec 01 '21 03:12 jw8023wh

我们的模型就是预训练出来的，你跑上面的language model pretrain有什么问题呢，可以发下错误截图。

Jan 11 '22 03:01 zijunsun

我们的模型就是预训练出来的，你跑上面的language model pretrain有什么问题呢，可以发下错误截图。

请问在run_mlm.py的预训练过程中，是要将

tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)

和

model = AutoModelForMaskedLM.from_pretrained( model_args.model_name_or_path, from_tf=bool(".ckpt" in model_args.model_name_or_path), config=config, cache_dir=model_args.cache_dir, revision=model_args.model_revision, use_auth_token=True if model_args.use_auth_token else None, )

分别替换为

tokenizer = BertMaskDataset(vocab_file, config_path)

和

model = GlyceBertForMaskedLM.from_pretrained(model_args.model_name_or_path)

吗？谢谢！

Aug 03 '22 08:08 cxyccc

ChineseBert ChineseBert copied to clipboard

关于预训练

ChineseBert
ChineseBert copied to clipboard