ChineseBert 如何用自己的数据进一步预训练

如何用自己的数据进一步预训练

Open cxyccc opened this issue 2 years ago • 2 comments

您好！请问您有模型预训练的代码吗？尝试使用run_mlm.py[https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling/run_mlm.py]进行进一步预训练，但代码中调用的tokenizer和您的模型中的tokenizer（BertMaskDataset）不同，替换后遇到了许多问题，希望您可以提供帮助~谢谢！

Jul 26 '22 08:07 cxyccc

你好，请问你有[方正古隶繁体.ttf24.npy]这个文件吗，现在下载不了这个文件，请问可以发我一份这个文件吗

Aug 19 '22 08:08 yanghh2000

您好！请问您有模型预训练的代码吗？尝试使用run_mlm.py[https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling/run_mlm.py]进行进一步预训练，但代码中调用的tokenizer和您的模型中的tokenizer（BertMaskDataset）不同，替换后遇到了许多问题，希望您可以提供帮助~谢谢！

请问您问题解决了吗？我最近也在尝试，一直没跑通，方便交流下吗

Nov 03 '22 12:11 Nonponder

ChineseBert ChineseBert copied to clipboard

如何用自己的数据进一步预训练

ChineseBert
ChineseBert copied to clipboard