NLP
NLP
或者,把jieba分词的那一段删掉,而且预训练的模型文件名也不该再用WWM的吧,chinese_L-12_H-768_A-12/vocab.txt这个文件里不该再有多字词,这个文件是不是也是用来训练wwm的。
大哥,训练和预测的预处理都该一致吧,这就算不用BERT都该明白。
您的wwm确实可以用自己的语料预训练,而且是词级的,很不错。才想着看看bert4keras是否也支持。
373行 https://github.com/bojone/bert4keras/blob/master/pretraining/data_utils.py#373
懂的都懂。。。也是醉了。
@bojone 好吧
thanks @patil-suraj if only run `nlp_torch = pipeline("feature-extraction", onnx=False)`, no problem. but run `nlp_onnx = pipeline("feature-extraction", onnx=True)`, will get error.
My operating system environment is Win10 Professional. My env info: \>pip list Package Version \----------------- --------- certifi 2020.6.20 chardet 3.0.4 click 7.1.2 coloredlogs 14.0 cycler 0.10.0 dataclasses 0.7 filelock 3.0.12...
@patil-suraj I don't know if I'm right. I estimate the reason is that the distilbert-base-cased model has not been download. Under MacOS, It will automatically download the distilbert-base-cased file, (I...
With Win10, errors occur even when the local model is used.