cjklib icon indicating copy to clipboard operation
cjklib copied to clipboard

Pinyin to MandarinIPA bugs

Open trevorld opened this issue 7 years ago • 2 comments

Thanks for your wonderful cjklib and cjknife command-line tool. When making system calls to cjknife to produce IPA for some Pinyin (I'm writing a command-line pinyin drilling program in R) and I noticed some bugs in the production of MandarinIPA using the following system call:

cjknife -s Pinyin -t MandarinIPA -m pinyin_to_convert_to_ipa
  1. cjknife throws an error when asking it to convert the legitimate pinyin yo, m, n, ng, hng, and hm. I've seen yo (final io without an initial) cast in ipa as [jo] or [jɔ]. Sometimes they use the i with a tilde underneath instead of a j. According to Wikipedia's syllabic consonant page you should be able to use [m̩], [n̩], [ŋ̍], [xŋ̍], and [xm̩] for those Mandarin syllabic consonant interjections (IPA adds a little line above or below to signify it is a syllabic consonant).

  2. cjknife gives 'o' IPA for Pinyin (u)o after b, p, m, f where it would have a 'wo' sound e.g. po = [pʰwo] not [p‘o]. Although written with an 'o' in fact bo, po, mo, fo (and wo) all have "uo" finals. The only examples of pure "o" finals are the interjection "o" and the rather rare participle "lo" (yo being the only example of the "io" final).

  3. cjknife gives incorrect IPA for erhua e.g. dianr3 = tjɐɚ̯ not tiɛn.ər If we restrict the erhua to what is expected to know in order to pass the 普通话水平测试 exam (i.e. who has a standard Mandarin pronunciation) we still have a lot of erhua syllables. For comparison I've compiled by own Mandarin syllable to IPA mapping:

https://u14129277.dl.dropboxusercontent.com/u/14129277/pinyin_ipa.csv

which I built from the following tables I compiled (the final and initial one mainly from the Pinyin and Erhua pages on Wikipedia but also from other sources) and the pinyin to initial to final I decomposed by hand from all the pinyin examples I could find):

https://u14129277.dl.dropboxusercontent.com/u/14129277/initial.csv

https://u14129277.dl.dropboxusercontent.com/u/14129277/final.csv

https://u14129277.dl.dropboxusercontent.com/u/14129277/pinyin_initial_final.csv

Thanks!

trevorld avatar Jul 07 '17 00:07 trevorld

Hey, thanks for the detailed drill down.

Sadly I don't currently have the time nor the focus to take care of that. I've made you a collaborator to this project, and invite you to fix this directly. Happy to try answering anything that comes up wrt the code. :)

cburgmer avatar Jul 07 '17 12:07 cburgmer

Okay, I have a couple conferences coming up so it may take me a couple months before I have the free time to fix it.

trevorld avatar Jul 07 '17 17:07 trevorld