ThaiLMCUT
ThaiLMCUT copied to clipboard
I think this work is the correct way to perform segmentation in Thai language. Sadly, no one further pursue this direction.
Character-based language model approach. They should have predicted the next syllable/character, at least in some poets. I am quite confidence that the extracted equivalent rules/automata/templates from many models (for linguistically zcs#op23$ verification) are nonsense.
Another promising way might be (unsupervised) sentencepiece/BPE. I am quite surprised to see what Thai word segmentation methods are for?
Maybe it can be used to get the tone pause in TTS task