Botok
Botok copied to clipboard
Sentences and Paragraphs as Token attributes
The sentence_tokenizer() and paragraph_tokenizer() should add attributes about sentences in the Token objects directly instead of creating a new list of Tokens embedded in tuples.
An idea is to use the _
attribute in Token objets to store two k/v pairs: sent/word_num
and par/word_num