pythainlp icon indicating copy to clipboard operation
pythainlp copied to clipboard

Feature request: Transliterate back into Thai

Open mashedcode opened this issue 5 years ago • 2 comments

In Thai it is very common to write English words with Thai letters. There's even a ruleset for doing this: หลักเกณฑ์การทับศพทั ์ภาษาอังกฤษ.

Poorly it's often the case that these rules are not obeyed which results in an awkward spelling. Therefore having a way to transliterate English to Thai would be great. To generalize this one might even go so far and do something like IPA to Thai instead so that any language will be supported and not just English.

By the way you guys are doing an awesome job!

mashedcode avatar Jun 18 '20 18:06 mashedcode

The simplest thing I could think of is to use the training code for the transliteration mode that we have with inverse source and target data. It probably gives us a good baseline, and then we can see what we could improve upon.

p16i avatar Jun 18 '20 18:06 p16i

Dataset for training: https://github.com/wannaphong/thai-romanization

wannaphong avatar Sep 23 '23 09:09 wannaphong