triki icon indicating copy to clipboard operation
triki copied to clipboard

Leveraging on Tokenization/Lexers alongside with Regex for certain complex scenarios

Open Ara4Sh opened this issue 1 year ago • 3 comments

Hello,

Thank you for your hard work on this project. The tool is incredibly useful, and I appreciate your dedication.

I'd like to propose having tokenization/lexers for pattern matching along side with regex. This change could improve reliability and consistency in obfuscating sensitive information (specially PII) and enhance error handling and complex structures, It's not as fast as regular expressions but it might be very useful when performance is not a KPI.

Is it feasible to integrate tokenization/lexers into the current codebase? Would this improve consistency and reliability in obfuscation when dealing with large file processing (over 1TB) in your opinion?

Ara4Sh avatar May 31 '24 14:05 Ara4Sh

Hi, can you provide an example? Are you talking about providing a custom matcher ? A custom masking to a given input?

Thanks in advance.

josacar avatar Jun 03 '24 20:06 josacar

something like go-sqllexer as custom matcher for certain scenarios.

Ara4Sh avatar Jun 10 '24 06:06 Ara4Sh

You can do lambas in each column to specify a condition and a replacement, are you looking for something more generic?

josacar avatar Jul 12 '24 16:07 josacar