omnivore icon indicating copy to clipboard operation
omnivore copied to clipboard

Don’t use first email `From` header to assign authorship

Open quinncomendant opened this issue 1 year ago • 0 comments

When Omnivore receives an email newsletter, it uses the email’s From header to determine the author of the item. This has a couple undesirable effects, described below. Omnivore should use a more stable source for the unique identifier of email subscriptions (I'll propose a idea at the bottom).

1. Authorship of forwarded emails attributed to sender, rather than the author of the article

If I forward an email to my Omnivore Inbox email address, I am assigned as the author of the article when it appears in my Following page:

IMG_8878

The expected behavior is to parse out the real author of the text. If I open the newsletter in a browser and save to Omnivore from there, the correct author is identified:

IMG_8881

It's also pretty surreal seeing my name in the AI Digest summaries, instead of the actual author's name:

IMG_8877

2. Changes in email From headers result in duplicate entries in the list of subscriptions

When an email is received, the name in its From header is compared with the currently indexed list of subscriptions, and if the name is unique it is considered a “new” subscription. This results in duplicate subscription entries for the same newsletter. Substack emails are remarkably egregious because their sender name changes with each participating coauthor.

This appears in Settings → Subscriptions on the website (notice how the “Your Local Epidemiologist” newsletter is duplicated with many variations):

SCR-20240826-qhmw

These duplicates also appear in the sidebar:

SCR-20240826-qhrc

This may also result in confusion when using the “Unsubscribe” function: because each of these duplicates share the same unsubscribe URL, if the user hits the Unsubscribe function it will unsubscribe them entirely from the newsletter (this is tracked in #1841).


I'm not sure the best way to solve this, but here's one idea. Fixing this may require sourcing two kinds of data:

  • A human-readable title.
  • An unchanging, unique, machine-readable identifier.

The unchanging identifier can be the email's List-Id header, or something like its List-Unsubscribe URL, or as a last option, the original From header's email address (the last one found in an email, which should correspond to the original sender of a forwarded email). Use this identifier as a unique key for the newsletter to prevent duplicates even if the human-readable name changes.

The human-readable title can be the name part of the last From header. When a new newsletter is received, save this name as a “canonical title” for the newsletter (and make it editable by the Omnivore user). If future emails arrive with different names, as happens frequently for Substack emails, don't update the title. It would be confusing for the title of a newsletter to be randomly changing all the time.

quinncomendant avatar Aug 28 '24 03:08 quinncomendant