zhon
zhon copied to clipboard
Cantonese Jyutping Support
For implementation in dragonmapper...
Rebase against latest. Can you squash where necessary?
Can you take out the version part so it can be changed when the release is cut?
As you can tell, I don't spend a lot of time on my Chinese GitHub projects anymore. Thanks for contributing!
This is a great start. It will need some work before it's ready to merge. I'm happy to work with you on the things that need to be changed.
- You have over 2000 Cantonese syllables in this pull request. I believe that's too many.
- You're missing some important syllables like
jyut
from Jyut-ping ;) - This doesn't follow the behavior of other modules in Zhon. You'll probably want to mimic the way that the zhuyin module does it.
- Use a regular expression pattern of initials and finals, not a list of syllables (you can use a list of syllables for the tests).
- For constants like
tones
(usemarks
instead), please mimic the Python's standard librarystring
module's constants, which are strings, not lists (for one thing lists are mutable).
I'd suggest using a resource like these three charts to create a regular expression pattern like the Zhuyin and Pinyin ones in Zhon.
Here is a short example of jyutping.py
using the syllables that end in aa
and ai
:
characters = 'bpmfdtnlgkhwzcsjaeiouy'
marks = '123456'
syl = syllable = (
'(?:'
'(?:(?:[gk]w|[bpmfdtnlzcsgkhwj])?aa)|'
'(?:(?:[gk]w|ng|[bpmfdtnlzcsgkhwj])?ai)'
')[{marks}]'
).format(marks=marks)
Awesome! Thanks for the help!
Will get to it when I can :-)