aleph icon indicating copy to clipboard operation
aleph copied to clipboard

FEATURE: Hide emails and email addresses from very common, non-personal senders

Open jlstro opened this issue 3 years ago • 0 comments

Is your feature request related to a problem? Please describe. Email-heavy datasets that consist of multiple inboxes tend to include a lot of spam and auto-generated email notifications. This makes finding relevant information hard sometimes and dilutes the xref results. At the same time, it is important to keep the integrity of these leaks, so removing the emails in question before ingest is not a good option. Also, some of them might bear vital information (travel confirmations, etc.) in certain contexts.

Describe the solution you'd like The search results UI and the xref tab could include a toggle that hides emails sent from an internal 'blacklist'. Initially, this list could be limited to obviously non-personal emails (noreply@..., google alerts, large airlines and hotel domains, known spammers etc.). Over time we could try to find a more sophisticated way to build the filters.

Describe alternatives you've considered The obvious alternative is removing these emails before ingest. Not great.

Additional context The problem becomes most clear when xref'ing two large email-heavy datasets, where in the top results most matches are from travel agencies, hotels, notifications, etc...

jlstro avatar Aug 08 '22 15:08 jlstro