Botok
Botok copied to clipboard
POS tags ? distinguishing some patterns
In a use case of phonetics I need to distinguish the sound of བ (ba or wa), but this seems currently impossible with botok:
རབ་གསལ་བསis tokenized asརབ་གསལ་ - བས(in that caseབསis pronouncedwé)བྱང་ཆུབ་བར་དུis tokenized asབྱང་ཆུབ་ - བར་ - དུ(in that caseབརis pronouncedbar)
is there any way I discriminate between the two with botok (or any other tool)?
བར་དུ་ should be added to the vocab. I would argue that it's a frozen expression by now. We'll add instructions on how to do this in the botok docs
well, what I'll do with another POS tagger is to look at the n.rel tag of https://web.archive.org/web/20170824153724/http://larkpie.net/tibetancorpus/tags