warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

Normalize and store email addresses

Open miketheman opened this issue 7 months ago • 0 comments

Email addresses are currently stored in a varchar(254) column in the database, with a non-null, and unique constraint.

However, as some email providers allow for extra characters in email addresses, and the column is not citext (case insensitive), we can have duplicate values.

Proposal:

  • add a new empty normalized_email citext column to Emails model
  • populate the column with normalized values of each email address during email addition
  • backfill existing records

This effort could be complemented by also adding a domain column to the table, and do the same work as normalization effort, to make the data representation very clear and unambiguous or reliant on string splitting.

This effort should be preceded by some queries on the table to determine how many of these we might expect to see.

miketheman avatar Apr 08 '25 18:04 miketheman