Young comments

Results 25 comments of


Young

How to use my own additional vocabulary dictionary?

@bsugerman seems u really couldn't avoid all the foreign characters when u use a very large corpus to train the models and also couldn't replace (could delete them somehow) them...

UnrecognizedFlagError!!

thank u very much. i would like to try to change the data-set to a Chinese data-set, a lot of work to do!

UnrecognizedFlagError!!

.npy file is actually a numpy type file, once u traind a 300d word embedding u can save it to a numpy type by using np.save() but more suggest that...

run_classifier_predict_online.py

@bihui9968 modeling.py and tokennizaton.py these two files u could cope them from bert core, which is: https://github.com/google-research/bert.

训练数据（demo-rasa_zh.json），训练后的MITIE模型（total_word_feature_extractor.dat），训练后的Rasa NLU的模型数据之间的关系

@ryshhxq 如果你能把所有可能情况枚举，那么你可能不需要一个模型来处理这个问题，但依然你需要考虑效率的问题，因为可能需要遍历所有的情况来对应你的问题。在无法采集到所有的数据，而且遍历的效率又太低的情况下，才考虑用一个算法（模型）用采集到的数据来估计真实的分布： 1. 利用算法，比如rasa nlu中封装的svm，深度学习分类器，mitie对词的特征抽取等，能够提高你的开发和代码效率； 2. 我们希望我们采用的算法有一定的泛化能力（不在训练集里，在你的目标问题范围内，又或者同分布等） 3. rasa nlu支持配置。

训练数据（demo-rasa_zh.json），训练后的MITIE模型（total_word_feature_extractor.dat），训练后的Rasa NLU的模型数据之间的关系

支持配置就是说你可以任意的换rasa里面封装好了的模型或者是你自行添加到rasa里的算法模型。 json里面的就是训练集，尽量添加你能够采集到来解决你的目标问题的真实数据。