metacrafter
metacrafter copied to clipboard
Consider to add named entity recognition
Named entity recognitions technology helps to identify named objects inside texts.
Strong
- allows to identify objects inside text blobs
- could allow to support more named entities (identifiers)
Weakness
- could be very slow
- need to prepare PII and identifier rules for recognition
Possible implementation - Slovnet https://github.com/natasha/slovnet
Presidio looks like possible NER engine. The ways to implement:
- support analysis of list of fields
- support analysis of any string fields with length greater than
max_lenparameter. Support this parameter. - link NER entities to semantic types registry