Botok Sentences and Paragraphs as Token attributes

Sentences and Paragraphs as Token attributes

Open drupchen opened this issue 5 years ago • 0 comments

The sentence_tokenizer() and paragraph_tokenizer() should add attributes about sentences in the Token objects directly instead of creating a new list of Tokens embedded in tuples.

An idea is to use the _ attribute in Token objets to store two k/v pairs: sent/word_num and par/word_num

Apr 16 '19 16:04 drupchen

Botok Botok copied to clipboard

Sentences and Paragraphs as Token attributes

Botok
Botok copied to clipboard