ckiptagger icon indicating copy to clipboard operation
ckiptagger copied to clipboard

WS coerce_dictionary parameter do not shared with NER

Open leungsolomon opened this issue 5 years ago • 1 comments

NER 不能用 WS coerce_dictionary 去分詞, 有沒有 fix / work around ?

Nb | 專有名詞 Nc | 地方詞

input (from README example)


sentence_list = ['瑞士 LAURASTAR S4a 熨燙護理系統', ....]

word_to_weight = {
    "瑞士 LAURASTAR": 1,
    }

# ws
word_sentence_list = ws(
    sentence_list,
    coerce_dictionary = dictionary1, 
)

# pos
pos_sentence_list = pos(word_sentence_list)

# ner
entity_sentence_list = ner(word_sentence_list, pos_sentence_list)

# Print result
print(word_sentence_list[1], pos_sentence_list[1])
for i, sentence in enumerate(sentence_list):
    print()
    print(f"'{sentence}'")
    print_word_pos_sentence(word_sentence_list[i],  pos_sentence_list[i])
    for entity in sorted(entity_sentence_list[i]):
        print(entity)

output

# without coerce_dictionary parameter
'瑞士 LAURASTAR S4a 熨燙護理系統'
['瑞士(Nc)', ' LAURASTAR S4(FW)', 'a (FW)', '熨燙(VC)', '護理(Na)', '系統(Na)']
(0, 2, 'PERSON', '瑞士')

# with coerce_dictionary parameter
'瑞士 LAURASTAR S4a 熨燙護理系統'
瑞士 LAURASTAR(Nb)  S4(FW) a (FW) 熨燙(VC) 護理(Na) 系統(Na) 
(0, 2, 'PERSON', '瑞士')

leungsolomon avatar Feb 06 '20 04:02 leungsolomon

dictionary 的加入可以客製化分詞結果。不過目前的架構下 NER 僅將 WS 的結果作為參考,NER 辨識出的實體邊界不一定是 WS 辨識出的分詞邊界。

jacobvsdanniel avatar Mar 31 '20 09:03 jacobvsdanniel