TextBlob icon indicating copy to clipboard operation
TextBlob copied to clipboard

evaluating speed / memory per annotator

Open danyaljj opened this issue 7 years ago • 2 comments

Hey there,

I need to evaluate the (max) memory usage and annotation speed, for the following annotators:

  • Tokenization into words
  • Tokenization into sentences
  • Part-of-speech tagging
  • Lemmatization

(And if these exist:

  • Named Entity Recognition
  • Shallow Parsing )

Given current examples, it seems that the code processes everything at once. Is it possible to make things lazy? (i.e. things get annotated / loaded, only if they are requested).

danyaljj avatar Jan 28 '18 05:01 danyaljj

These are lazy-loaded already. The TextBlob constructor configures the tokenizers/taggers etc, but they don't get called until you invoke a method or access a property that requires it.

Can you further explain your question, perhaps with some code samples?

jschnurr avatar Feb 04 '18 02:02 jschnurr

Ah I see. You answered my question. I was under the impression that everything is loaded at once at the beginning (SpaCy is like that). I will do an initial evaluation on this and get back to you, after I confirmed it.

danyaljj avatar Feb 04 '18 02:02 danyaljj