Jeff Zemerick
Jeff Zemerick
Use a bloom filter in the `IgnoredTermsFilter` when processing the text when there is a large number of terms to ignore. What constitutes "large" needs to be determined.
When creating an identifier filter, the policy should allow the user to specify a list of context terms and also the initial confidence. This is in `IdentifierFilter`.
Don't use a bloom filter for a custom dictionary when the number of terms is small. "Small" needs to be defined or configurable. This is in `PhileasFilterService` when the custom...
Add pre-defined filters for the legal domain. This is in PhileasFilterService when the policy's domain is `health`.
Add pre-defined filters for the legal domain. This is in `PhileasFilterService` when the policy's domain is `legal`.
Add AWS access/secret key detection support. Macie includes it: Using managed data identifiers in Amazon Macie - Amazon Macie
Allow a color to be set in the filter profile when redacting in PDF.
Implement some way of allowing for a manual review of the redacted information. Ideas: * Use brackets but leave the names in there for manual review.
Allow for setting PHILTER_NER_ENDPOINT in the filter profile instead as an environment variable. That way you could have separate services per filter profile. Would this require setting up the filter...