Young

Results 25 comments of Young

@bsugerman seems u really couldn't avoid all the foreign characters when u use a very large corpus to train the models and also couldn't replace (could delete them somehow) them...

thx, but missing embed300.trim.npy?

is the embed300.trim.npy a word2vec?

thank u very much. i would like to try to change the data-set to a Chinese data-set, a lot of work to do!

plz give the full error and Traceback

.npy file is actually a numpy type file, once u traind a 300d word embedding u can save it to a numpy type by using np.save() but more suggest that...

@bihui9968 modeling.py and tokennizaton.py these two files u could cope them from bert core, which is: https://github.com/google-research/bert.

可以将cnn的分类模型填到pipeline中去

@ryshhxq 如果你能把所有可能情况枚举,那么你可能不需要一个模型来处理这个问题,但依然你需要考虑效率的问题,因为可能需要遍历所有的情况来对应你的问题。在无法采集到所有的数据,而且遍历的效率又太低的情况下,才考虑用一个算法(模型)用采集到的数据来估计真实的分布: 1. 利用算法,比如rasa nlu中封装的svm,深度学习分类器,mitie对词的特征抽取等,能够提高你的开发和代码效率; 2. 我们希望我们采用的算法有一定的泛化能力(不在训练集里,在你的目标问题范围内,又或者同分布等) 3. rasa nlu支持配置。

支持配置就是说你可以任意的换rasa里面封装好了的模型或者是你自行添加到rasa里的算法模型。 json里面的就是训练集,尽量添加你能够采集到来解决你的目标问题的真实数据。