metacrafter icon indicating copy to clipboard operation
metacrafter copied to clipboard

Consider to add named entity recognition

Open ivbeg opened this issue 3 years ago • 1 comments

Named entity recognitions technology helps to identify named objects inside texts.

Strong

  • allows to identify objects inside text blobs
  • could allow to support more named entities (identifiers)

Weakness

  • could be very slow
  • need to prepare PII and identifier rules for recognition

Possible implementation - Slovnet https://github.com/natasha/slovnet

ivbeg avatar Feb 11 '22 14:02 ivbeg

Presidio looks like possible NER engine. The ways to implement:

  • support analysis of list of fields
  • support analysis of any string fields with length greater than max_len parameter. Support this parameter.
  • link NER entities to semantic types registry

ivbeg avatar Aug 05 '22 06:08 ivbeg