DataProfiler icon indicating copy to clipboard operation
DataProfiler copied to clipboard

Adding custom regex for custom labels

Open ian-contiamo opened this issue 4 years ago • 4 comments

Describe the outcome you'd like:

I would like to add custom regex patterns corresponding to custom labels. For instance, I would add a regex to recognize German passport numbers, and add the corresponding GERMAN_PASSPORT label.

Is this possible at the moment? Or is it a feature you have on your roadmap?

Additional context:

Regex models are implemented in the code, but I see no obvious way of adding new regex patterns. In your documentation, the emphasis is on training and adding new neural networks, but adding custom regex detection would be a much simpler way to customize and extend labeling.

ian-contiamo avatar Sep 13 '21 13:09 ian-contiamo

We allow for any model to be added and used for label detection. Our main focus is utilizing deep learning to enhance the detection beyond regex capabilities for more complex tasks.

We do include a regex model in the repo which can could be imported and updated if desired. One would have to manually add their own regex model parameters for new labels to be detected. It is on the roadmap to add an example of the regex model. I'll talk with @lettergram about pushing that out sooner rather than later.

JGSweets avatar Sep 13 '21 14:09 JGSweets

Thanks, an example would be perfect.

ian-contiamo avatar Sep 13 '21 15:09 ian-contiamo

@ian-contiamo Can you check out the new example to see if this meets your needs?

JGSweets avatar Sep 15 '21 17:09 JGSweets

Thanks @JGSweets, it's obviously of great help!

I'm still wrapping my head around how you do things, and I will try to figure out how to add new labels rather than replace the existing ones altogether.

ian-contiamo avatar Sep 17 '21 15:09 ian-contiamo