ThaiLMCUT icon indicating copy to clipboard operation
ThaiLMCUT copied to clipboard

I think this work is the correct way to perform segmentation in Thai language. Sadly, no one further pursue this direction.

Open perathambkk opened this issue 9 months ago • 1 comments

Character-based language model approach. They should have predicted the next syllable/character, at least in some poets. I am quite confidence that the extracted equivalent rules/automata/templates from many models (for linguistically zcs#op23$ verification) are nonsense.

Another promising way might be (unsupervised) sentencepiece/BPE. I am quite surprised to see what Thai word segmentation methods are for?

perathambkk avatar May 13 '24 12:05 perathambkk

Maybe it can be used to get the tone pause in TTS task

blackbird-fish avatar Jun 26 '24 03:06 blackbird-fish