ckiptagger
ckiptagger copied to clipboard
POS tagging
I've tried the following example as input:
這些語辭都含有高調音
這些(Neqa) 語辭(Na) 都(D) 含有(VJ) 高(VH) 調音(VA)
With customized dictionary, it was able to tag 高調音 as Na.
word_to_weight = { "高調音": 1, "土地公": 1, "土地婆": 1, "公有": 2, "": 1, "來亂的": "啦", "緯來體育台": 1, }
word_sentence_list = ws(sentence_list, recommend_dictionary=dictionary)
Is there any code or paper describe how data (token_list.npy, vector_list.np, model_pos, etc) were trained/created?
Thanks.
Thanks!
On March 21, 2021 at 10:01 PM Mu Yang @.***> wrote:
Both embeddings are trained using the Word2Vec model from gensim. Here is the detail of the corpus https://github.com/ckiplab/ckiptagger/wiki/Corpora . — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ckiplab/ckiptagger/issues/34#issuecomment-803712659 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6IED2TAOUPMUCJQ5CKPQTTE2QGJANCNFSM4ZQGLF4Q .
On this page, I followed POS tagging link ./data/model_ner/pos_list.txt -> 詞性列表,請見 Wiki / Technical Report no. 93-05 from https://github.com/ckiplab/ckiptagger/wiki/Chinese-README
It mentioned there is a electronic dictionary that include each vocabulary's type (詞性). How get I get access?
Thanks.