NLP comments

Results 23 comments of

NLP

tokenizer不分词

或者，把jieba分词的那一段删掉，而且预训练的模型文件名也不该再用WWM的吧，chinese_L-12_H-768_A-12/vocab.txt这个文件里不该再有多字词，这个文件是不是也是用来训练wwm的。

tokenizer不分词

您的wwm确实可以用自己的语料预训练，而且是词级的，很不错。才想着看看bert4keras是否也支持。

tokenizer不分词

373行 https://github.com/bojone/bert4keras/blob/master/pretraining/data_utils.py#373

run error in benchmark_pipelines.ipynb on Win 10

thanks @patil-suraj if only run `nlp_torch = pipeline("feature-extraction", onnx=False)`, no problem. but run `nlp_onnx = pipeline("feature-extraction", onnx=True)`, will get error.

run error in benchmark_pipelines.ipynb on Win 10

My operating system environment is Win10 Professional. My env info: \>pip list Package Version \----------------- --------- certifi 2020.6.20 chardet 3.0.4 click 7.1.2 coloredlogs 14.0 cycler 0.10.0 dataclasses 0.7 filelock 3.0.12...

run error in benchmark_pipelines.ipynb on Win 10

@patil-suraj I don't know if I'm right. I estimate the reason is that the distilbert-base-cased model has not been download. Under MacOS, It will automatically download the distilbert-base-cased file, (I...

run error in benchmark_pipelines.ipynb on Win 10

With Win10, errors occur even when the local model is used.