FoolNLTK icon indicating copy to clipboard operation
FoolNLTK copied to clipboard

加载用户字典不起作用以及实体未识别出来的情况

Open ShuGao0810 opened this issue 5 years ago • 3 comments

博主好,foolnltk使用时发现加载用户字典不起作用,不知道是什么原因导致的,具体如下: 环境:win10+python3.6

fool.analysis('阿里收购饿了么') 返回:([[('阿里', 'nz'), ('收购', 'v'), ('饿', 'v'), ('了', 'y'), ('么', 'y')]], [[(0, 3, 'company', '阿里')]])

用户字典格式: 饿了么 10

fool.load_userdict(path) fool.analysis('阿里收购饿了么') 返回:([[('阿里', 'nz'), ('收购', 'v'), ('饿', 'v'), ('了', 'y'), ('么', 'y')]], [[(0, 3, 'company', '阿里')]])

加载用户字典似乎不起作用?分词时“饿了么”还是被拆开了,实体识别中也没识别出来

ShuGao0810 avatar Sep 29 '18 08:09 ShuGao0810

@ShuGao0810 谢谢你的反馈,现在的词典在分词的时候是有效的,analysis 不支持,稍后修改

rockyzhengwu avatar Oct 06 '18 09:10 rockyzhengwu

如何加载jieba格式的字典,

xrzlizheng avatar Nov 13 '18 10:11 xrzlizheng

@ShuGao0810 或许可行的解决办法:修改__init__.py ner 的修改抄 cut 的

这样改好像不行 ><

def ner(text, ignore=False):
    text = _check_input(text, ignore)
    if not text:
        return [[]]
    res = LEXICAL_ANALYSER.ner(text)
-    return res
+    new_words = []
+    if _DICTIONARY.sizes != 0:
+        for sent, words in zip(text, res):
+            words = _mearge_user_words(sent, words)
+            new_words.append(words)
+    else:
+        new_words = res
+    return new_words


def analysis(text, ignore=False):
    text = _check_input(text, ignore)
    if not text:
        return [[]], [[]]
-    res = LEXICAL_ANALYSER.analysis(text)
-    return res
+    word_inf = pos_cut(text)
+    ners = ner(text)
+    return word_inf, ners
a = ['阿里收购饿了么']
fool.load_userdict('foolnltk_userdict.txt')
# fool.delete_userdict()
print(fool.cut(a))
[['阿里', '收购', '饿了么']]

print(fool.analysis(a))
([[('阿里', 'nz'), ('收购', 'v'), ('饿了么', 'nz')]], [['阿里收购', '饿了么']])

@rockyzhengwu 应该是笔误吧: init.py 下

_mearge_user_words -- 改为 --> _merge_user_words

yu45020 avatar Dec 14 '18 03:12 yu45020