Korakot Chaovavanich comments

Results 23 comments of


                                            Korakot Chaovavanich

Thai NLP Project backlog

A few ideas mostly still in planning phase: - tokenization together with spellcheck - autocorrect from such spellcheck - misspelling dataset - sentence (or EDU) segmentation dataset - thai word...

For YouTube subtitle dataset. Here's the current resources & work-in-progress. - A script that run every hour, searching for new youtube videos that might have a Thai subtitle. See [thai_sub.gs](https://script.google.com/d/1BCMtSZe7DFimStGg_pscaQ5bkqvUayS-eTgG0VJbaIxNm_T9a97sXhDU/edit?usp=sharing)...

Adding support for Thai Language

There are some progress. A new constituency treebank came out, so need conversion. The Thai PUD needs update, no progress yet. TNC treebank still has only head info, but no...

Adding support for Thai Language

For BEST, probably the same as InterBest, (they rename it a few times). Here's my list of direct links to them. https://gist.github.com/korakot/abf6c18c71cefe7b9107689dd904751f For orchid, you can get it here. https://www.nectec.or.th/corpuso/phocadownload/dl_text_thai-eng/orchid_corpus.zip...