Botok issues

labels

1

I suggest that we start to use labeling in following way: - there are three concepts; priority labels, context labels, and other labels - priority labels have three classes; issues,...

mikkokotila

understanding custom pipelines

3

In the below toy example, my expectation is to achieve a tokenized version of the input text. With the below code, the result is a list of tokens, but tokens...

mikkokotila

statistics performance with tokenizer.list_word_types

3

As it stands, `Text(doc).list_word_types` includes tokenization and statistical operation (basic word frequency). In a typical workflow I might first tokenize, and then get some statistics for it. Obviously this would...

mikkokotila

NONE error when trying to match int or bool token attributes

4

Trying to match int and bool with cql creates a NONE error. This seems to happen somewhere in the fsa file. It's an issue since it stops us from matching...

ngawangtrinley

help wanted

Sentencize a list of tokens that have been manually tokenized by adding spaces

1

Hi, I'm wondering whether it is possible to conduct sentence tokenization on a list of tokens that have already been tokenized (without breaking the original word tokenization)? I tried [the...

BLKSerene

Sentences and Paragraphs as Token attributes

The sentence_tokenizer() and paragraph_tokenizer() should add attributes about sentences in the Token objects directly instead of creating a new list of Tokens embedded in tuples. An idea is to use...

drupchen

enhancement

finding sentence limits

11

While it seems quite reasonable to cut on naro + shad, there are so many edge cases where the proper cut is difficult to find that it would be helpful...

eroux

Botok
Botok copied to clipboard

Metadata

labels

understanding custom pipelines

statistics performance with tokenizer.list_word_types

NONE error when trying to match int or bool token attributes

Sentencize a list of tokens that have been manually tokenized by adding spaces

Sentences and Paragraphs as Token attributes

finding sentence limits

Exclude test suite

Remove Python installer

Fix python version

← Metadata

Owner

Metadata

Botok Botok copied to clipboard

Metadata

← Metadata

Owner

Metadata

Botok
Botok copied to clipboard