ckiptagger
ckiptagger copied to clipboard
WS coerce_dictionary parameter do not shared with NER
NER 不能用 WS coerce_dictionary 去分詞, 有沒有 fix / work around ?
Nb | 專有名詞 Nc | 地方詞
input (from README example)
sentence_list = ['瑞士 LAURASTAR S4a 熨燙護理系統', ....]
word_to_weight = {
"瑞士 LAURASTAR": 1,
}
# ws
word_sentence_list = ws(
sentence_list,
coerce_dictionary = dictionary1,
)
# pos
pos_sentence_list = pos(word_sentence_list)
# ner
entity_sentence_list = ner(word_sentence_list, pos_sentence_list)
# Print result
print(word_sentence_list[1], pos_sentence_list[1])
for i, sentence in enumerate(sentence_list):
print()
print(f"'{sentence}'")
print_word_pos_sentence(word_sentence_list[i], pos_sentence_list[i])
for entity in sorted(entity_sentence_list[i]):
print(entity)
output
# without coerce_dictionary parameter
'瑞士 LAURASTAR S4a 熨燙護理系統'
['瑞士(Nc)', ' LAURASTAR S4(FW)', 'a (FW)', '熨燙(VC)', '護理(Na)', '系統(Na)']
(0, 2, 'PERSON', '瑞士')
# with coerce_dictionary parameter
'瑞士 LAURASTAR S4a 熨燙護理系統'
瑞士 LAURASTAR(Nb) S4(FW) a (FW) 熨燙(VC) 護理(Na) 系統(Na)
(0, 2, 'PERSON', '瑞士')
dictionary 的加入可以客製化分詞結果。不過目前的架構下 NER 僅將 WS 的結果作為參考,NER 辨識出的實體邊界不一定是 WS 辨識出的分詞邊界。