Lalita Lowphansirikul issues

Repositories
Issues
Comments

Results 4 issues of


                                            Lalita Lowphansirikul

Feature: add new NER scheme

Explore percentage of repetitive characters in wisesight corpus

Todo - [ ] Sample sentences from ws-large corpus and find words with repetitive characters

Fix an issue where input tokens to the WangchanBERTa NER pipeline may be in the incorrect form

According to the model finetuning pipeline, the input tokens are tokenized with other tokenizer (e.g. PyThaiNLp's newmm for `thainer` dataset) and then retokenizer with SentencePiece tokenizer. However, the input tokens...

Create a script to evaluate MT models on Wang's dataset

Todos: - [x] Define the rules (e.g. tokenization) for BLEU score evaluation for both TH → EN and EN → TH direction. - [x] Write a new script to evaluate...