funnel icon indicating copy to clipboard operation
funnel copied to clipboard

Email and phone scrub lists

Open jace opened this issue 2 years ago • 1 comments

To audit for spam and dead contacts, we will need external data sources. Two examples:

  • StopForumSpam provides a list of known abusive email addresses. When a user attempts sign-up with an email address in this list, they should be gated (eg: emailing support to request an unblock). For existing accounts, we'll need manual validation before classifying the account as spam.

  • TRAI publishes a Mobile Number Revocation List (MNRL) for expired Indian mobile phone numbers. These should be forgotten from our database, unlinking them from user accounts.

Both databases are significantly larger than our own, so it doesn't make sense to overload the existing EmailAddress and PhoneNumber models to hold this data. Instead, we should follow the model adopted with Geoname data, hosting this in a separate database with periodic updates.

This will entail:

  1. New bind_key alongside geoname for hosting contact data, or maybe rename geoname itself to be an extdata for external data.
  2. New CLI commands for downloading these databases, loading them, and scrubbing existing data.
  3. For the spam lists, a bloom filter for rapid lookup before doing a full index scan.

jace avatar Jul 05 '23 07:07 jace

MNRL scrub support was added in #1810 but is pending a notification to users before it goes into production use.

jace avatar Jul 31 '23 05:07 jace