knowledge-graph-learning
knowledge-graph-learning copied to clipboard
ACL-2018-Subcharacter Information in Japanese Embeddings: When Is It Worth It?
Summary:
subcharacter information对于中文是有效的,那么日文又如何呢?研究发现subcharacter对于中文的提升效果在日文上并不稳定(我想应该是有片假名和平假名的缘故吧)。但是在一些汉字比较多的场景下,character ngrams效果确实有提高。不过在实验中,发现即使是enhanced skip-gram 也比不上 single-character ngram fasttext。
Resource:
- [code](
- [paper-with-code](
Paper information:
- Author:
- Dataset:
- keywords:
Notes:

fastText是subword level model,可以学习character n-grams。

- SG: we modified SG by summing the target word vector w with vectors of its constituent characters c1, and c2. This can be regarded as a special case of FastText, where the minimal n-gram size and maximum n-gram size are both set to 1.
- SG+kanji: learn Chinese word embeddings based on characters and sub-characters (Yu 2017 Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components)
- SG+kanji+bushu: 加了 偏旁部首 的意思
Model Graph:
Result::
Thoughts:
Next Reading:
请问有开源么