evaluating speed / memory per annotator
Hey there,
I need to evaluate the (max) memory usage and annotation speed, for the following annotators:
- Tokenization into words
- Tokenization into sentences
- Part-of-speech tagging
- Lemmatization
(And if these exist:
- Named Entity Recognition
- Shallow Parsing )
Given current examples, it seems that the code processes everything at once. Is it possible to make things lazy? (i.e. things get annotated / loaded, only if they are requested).
These are lazy-loaded already. The TextBlob constructor configures the tokenizers/taggers etc, but they don't get called until you invoke a method or access a property that requires it.
Can you further explain your question, perhaps with some code samples?
Ah I see. You answered my question. I was under the impression that everything is loaded at once at the beginning (SpaCy is like that). I will do an initial evaluation on this and get back to you, after I confirmed it.