Botok
Botok copied to clipboard
Missing pos for PUNCT
System:
- botok: v0.8.8
Reproduce
tokens = wt.tokenize("༄༅། །བློ་སྦྱོང་དོན་?")
print(tokens[0])
Output
text: "༄༅། །"
char_types: |NORMAL_PUNCT|NORMAL_PUNCT|NORMAL_PUNCT|TRANSPARENT|NORMAL_PUNCT|
chunk_type: PUNCT
start: 0
len: 5
Expected output:
text: "༄༅། །"
char_types: |NORMAL_PUNCT|NORMAL_PUNCT|NORMAL_PUNCT|TRANSPARENT|NORMAL_PUNCT|
chunk_type: PUNCT
pos: PUNCT
start: 0
len: 5