Botok icon indicating copy to clipboard operation
Botok copied to clipboard

`token.text_unaffixed` failed to add tsek

Open 10zinten opened this issue 3 years ago • 0 comments

Reproduce script

tokens = wt.tokenize("རིན་ཆེན་མིའི")
print(tokens)

output:

[text: "རིན་ཆེན་"
text_cleaned: "རིན་ཆེན་"
text_unaffixed: "རིན་ཆེན་"
syls: ["རིན", "ཆེན"]
pos: OTHER
lemma: རིན་ཆེན་
senses: | pos: OTHER, freq: 22841, affixed: False, lemma: རིན་ཆེན་ |
char_types: |CONS|VOW|CONS|TSEK|CONS|VOW|CONS|TSEK|
chunk_type: TEXT
freq: 22841
syls_idx: [[0, 1, 2], [4, 5, 6]]
syls_start_end: [{'start': 0, 'end': 4}, {'start': 4, 'end': 8}]
start: 0
len: 8

, text: "མི"
text_cleaned: "མི"
text_unaffixed: "མི"
syls: ["མི"]
pos: PART
lemma: མི་
senses: | pos: PART, freq: 883801, affixed: True, lemma: མི་ |
char_types: |CONS|VOW|
chunk_type: TEXT
freq: 883801
affix_host: True
syls_idx: [[0, 1]]
syls_start_end: [{'start': 0, 'end': 2}]
start: 8
len: 2

, text: "འི"
text_cleaned: "འི་"
text_unaffixed: "འི་"
syls: ["འི"]
pos: PART
lemma: གི་
senses: | lemma: གི་ |
char_types: |CONS|VOW|
chunk_type: TEXT
affix: True
syls_idx: [[0, 1]]
syls_start_end: [{'start': 2, 'end': 4}]
start: 10
len: 2

]

10zinten avatar May 09 '22 11:05 10zinten