hanzi-tools
hanzi-tools copied to clipboard
Add the source used for this system
- [ ] what is the source ? Unihan, CJKlib, Moedict, ...
- [ ] how many characters covered
+Thanks for this project !
The dictionary used is CC-CEDICT and whatever node-pinyin uses behind the scenes. I'm not sure exactly how many characters are covered, I'll have to investigate this later.
According to node-pinyin's Readme.md#Source
- https://code.google.com/archive/p/chinese-character-2-pinyin/
- maybe others pinyin sources listed as well (IME)
Strictly speaking, node-pinyin's data is in /tools/dict2.js. After cleanup, there are 24449 characters/phonetic pairs, which looks pretty much as the UNIHAN data, currently at 25500 entries.


node-pinyin's data format doesnt suit linguistic studies tho, as there can be several phonetic entries pairing with the same characters. Without prioritization (i.e. by freq), therefore fiting IME needs but not linguistic needs.
