Botok
Botok copied to clipboard
Sentences and Paragraphs as Token attributes
The sentence_tokenizer() and paragraph_tokenizer() should add attributes about sentences in the Token objects directly instead of creating a new list of Tokens embedded in tuples.
An idea is to use the _ attribute in Token objets to store two k/v pairs: sent/word_num and par/word_num