PyRedactKit icon indicating copy to clipboard operation
PyRedactKit copied to clipboard

Consider using space instead of nltk for detecting names.

Open brootware opened this issue 3 years ago • 0 comments

Checklist

  • [x] There are no similar reports on existing issues (including closed ones).
  • [x] I was in the master branch of the latest code.

Is your feature request related to a problem? Please describe

Describe the solution you'd like

current nltk library is way too slow iterating through part of speech tagging. Consider using Cython loops spacy instead to identify names. Reference articles below. https://medium.com/huggingface/100-times-faster-natural-language-processing-in-python-ee32033bdced https://www.activestate.com/blog/natural-language-processing-nltk-vs-spacy/

Describe alternatives you've considered

Additional context

brootware avatar May 26 '22 12:05 brootware