WeatherBot icon indicating copy to clipboard operation
WeatherBot copied to clipboard

NLU训练数据很慢

Open jiangdongguo opened this issue 7 years ago • 5 comments

你好!感谢你的项目,很赞。这里我有个问题,就是我在训练NLU数据的时候,总共51条,4个意图,10个实体,但是5个小时才能够训练完,有时候还会挂掉,请问是什么原因导致的?还是我在构造训练数据时对实体、意图的标志不对?期待解答,谢谢!

jiangdongguo avatar Mar 29 '19 08:03 jiangdongguo

出现错误时,是什么信息?你的机器什么配置?是否改动了项目的默认配置?

howl-anderson avatar Mar 29 '19 10:03 howl-anderson

训练时卡在: Loading model cost 0.874 seconds. Prefix dict has been built succesfully. Training to recognize 10 labels: 'item', 'loc', 'number', 'hello', 'name', 'affirm', 'finish', 'thank', 'bye', 'time' Part I: train segmenter words in dictionary: 200000 num features: 271 now do training C: 20 epsilon: 0.01 num threads: 1 cache size: 5 max iterations: 2000 loss per missed segment: 3 C: 20 loss: 3 0.857143 C: 35 loss: 3 0.857143 C: 20 loss: 4.5 0.914286 C: 5 loss: 3 0.857143 C: 20 loss: 1.5 0.714286 C: 20 loss: 4.75 0.914286 C: 21.5 loss: 4.65 0.914286 C: 17.7498 loss: 4.60893 0.914286 C: 20 loss: 4.4 0.914286 C: 20.9071 loss: 4.45791 0.914286 best C: 20 best loss: 4.5 num feats in chunker model: 4095 train: precision, recall, f1-score: 0.972222 1 0.985915 Part I: elapsed time: 1 seconds.

Part II: train segment classifier now do training num training samples: 36 这里就不动了,跑的是您的项目,所有配置都没改,只是训练的是自己的语料库: { "rasa_nlu_data": { "common_examples": [ { "text": "车牌", "intent": "request_search", "entities": [ { "start": 0, "end": 2, "value": "车牌", "entity": "item" } ] }, { "text": "帮我查个车", "intent": "request_search", "entities": [ { "start": 4, "end": 5, "value": "车", "entity": "item" } ] }, { "text": "查车牌", "intent": "request_search", "entities": [ { "start": 1, "end": 3, "value": "车牌", "entity": "item" } ] }, { "text": "搜索车牌号码", "intent": "request_search", "entities": [ { "start": 2, "end": 6, "value": "车牌号码", "entity": "item" } ] }, { "text": "查看车牌号", "intent": "request_search", "entities": [ { "start": 2, "end": 5, "value": "车牌号", "entity": "item" } ] }, { "text": "我想搜索车牌号码浙AB8888", "intent": "request_search", "entities": [ { "start": 4, "end": 8, "value": "车牌号码", "entity": "item" }, { "start": 8, "end": 9, "value": "浙", "entity": "loc" }, { "start": 9, "end": 15, "value": "AB8888", "entity": "number" } ] }, { "text": "鲁JB1686", "intent": "request_search", "entities": [ { "start": 1, "end": 7, "value": "JB1686", "entity": "number" }, { "start": 0, "end": 1, "value": "鲁", "entity": "loc" } ] }, { "text": "hi", "intent": "greet", "entities": [ { "start": 0, "end": 2, "value": "hi", "entity": "hello" } ] }, { "text": "嘿", "intent": "greet", "entities": [] }, { "text": "嗨", "intent": "greet", "entities": [ { "start": 0, "end": 1, "value": "嗨", "entity": "hello" } ] }, { "text": "hi 小智", "intent": "greet", "entities": [ { "start": 0, "end": 2, "value": "hi", "entity": "hello" }, { "start": 3, "end": 5, "value": "小智", "entity": "name" } ] }, { "text": "你好", "intent": "greet", "entities": [ { "start": 0, "end": 2, "value": "你好", "entity": "hello" } ] }, { "text": "你好小智", "intent": "greet", "entities": [ { "start": 0, "end": 2, "value": "你好", "entity": "hello" }, { "start": 2, "end": 4, "value": "小智", "entity": "name" } ] }, { "text": "早", "intent": "greet", "entities": [ { "start": 0, "end": 1, "value": "早", "entity": "hello" } ] }, { "text": "早,小丽", "intent": "greet", "entities": [ { "start": 0, "end": 1, "value": "早", "entity": "hello" }, { "start": 2, "end": 4, "value": "小丽", "entity": "name" } ] }, { "text": "你好啊", "intent": "greet", "entities": [ { "start": 0, "end": 2, "value": "你好", "entity": "hello" } ] }, { "text": "是的", "intent": "affirm", "entities": [ { "start": 0, "end": 1, "value": "是", "entity": "affirm" } ] }, { "text": "对的", "intent": "affirm", "entities": [ { "start": 0, "end": 1, "value": "对", "entity": "affirm" } ] }, { "text": "好的", "intent": "affirm", "entities": [ { "start": 0, "end": 1, "value": "好", "entity": "affirm" } ] }, { "text": "算了", "intent": "finish", "entities": [ { "start": 0, "end": 2, "value": "算了", "entity": "finish" } ] }, { "text": "不用了", "intent": "finish", "entities": [ { "start": 0, "end": 2, "value": "不用", "entity": "finish" } ] }, { "text": "没事了", "intent": "finish", "entities": [ { "start": 0, "end": 2, "value": "没事", "entity": "finish" } ] }, { "text": "好的,谢谢你", "intent": "thanks", "entities": [ { "start": 3, "end": 5, "value": "谢谢", "entity": "thank" } ] }, { "text": "谢谢", "intent": "thanks", "entities": [ { "start": 0, "end": 2, "value": "谢谢", "entity": "thank" } ] }, { "text": "再见", "intent": "say_bye", "entities": [ { "start": 0, "end": 2, "value": "再见", "entity": "bye" } ] }, { "text": "多谢啦", "intent": "thanks", "entities": [ { "start": 0, "end": 2, "value": "多谢", "entity": "thank" } ] }, { "text": "拜拜", "intent": "say_bye", "entities": [ { "start": 0, "end": 2, "value": "拜拜", "entity": "bye" } ] }, { "text": "下次见", "intent": "say_bye", "entities": [] }, { "text": "晚上好", "intent": "greet", "entities": [ { "start": 0, "end": 2, "value": "晚上", "entity": "time" }, { "start": 2, "end": 3, "value": "好", "entity": "affirm" } ] }, { "text": "车牌号码", "intent": "request_search", "entities": [ { "start": 0, "end": 4, "value": "车牌号码", "entity": "item" } ] } ] } }

请问是什么问题?谢谢!

jiangdongguo avatar Apr 01 '19 01:04 jiangdongguo

SVM 训练起来确实特别慢,具体慢到什么程度和机器相关。目前来看只是慢,并没有什么bug。

howl-anderson avatar Apr 01 '19 03:04 howl-anderson

谢谢回复!以下是我的机器配置:

i5-7300 8g win10 64bit

这个跟实体和意图的个数是否有关系?nlu的语料库编写是否有准确的规则?

jiangdongguo avatar Apr 01 '19 03:04 jiangdongguo

这个跟实体和意图的个数是否有关系?

是的,有关系的

nlu的语料库编写是否有准确的规则?

一般建议按照真实数据的分布编写,项目早期可以先写一部分,根据实际效果再做调整

howl-anderson avatar Apr 01 '19 07:04 howl-anderson