infra icon indicating copy to clipboard operation
infra copied to clipboard

Migrate away from ImprovMX for mailing lists

Open jfly opened this issue 1 year ago • 9 comments

We currently use ImprovMX to handle mail sent to @nixos.org (see relevant dns entries).

  • We only use ImprovMX for mail forwarding (teams like infra@, marketing@, etc). Today, nobody sends mail from @nixos.org, and nobody has any inboxes.
  • You need a web account with ImprovMX to see and to update these mail forwards. The Nix community can't see/audit any of this.
  • There are various limits (number of forwards, perhaps the number of emails an address can forward to?). See https://improvmx.com/pricing/. I don't know if we're currently paying for ImprovMX. I think I heard that we've run into some of these limits.

The plan

A few weeks ago, @Mic92 asked me to look into self hosting this instead. He recommended Simple NixOS Mailserver (SNM). I've played with it a bit, and it does seem like a good fit here.

  1. [x] Install SNM on umbriel.
    • The configuration docs here are great: https://nixos-mailserver.readthedocs.io/en/latest/setup-guide.html.
    • Leave mailserver.loginAccounts empty, and disable pop/imap.
    • Port the existing mailing lists from ImprovMX to mailserver.forwards
      • @Mic92 has posted a dump [REMOVED] (accurate as of 2024-09-30).
  2. [x] Verify this server can successfully send mail (target: 10/10 on https://www.mail-tester.com/). Either by temporarily adding a login account, or speaking directly to postfix via the cli.
  3. [x] Monitor smtp tls (see below).
  4. [x] Alert on stmp tls monitor failures.
  5. [x] Make it possible to send emails as nixos.org (start replacing mail-test.nixos.org with nixos.org).
  6. [x] Wait until the Nix Steering Committee Election is done: https://nixos.org/blog/announcements/2024/sc-election-2024/.
  7. [x] #585
  8. [ ] #586
  9. [ ] #587

Notes

  1. Monitoring
    • Ideally, the infra team would get alerted if emails have been sitting in a postfix queue for a long time. Are there any best practices for this? We use Prometheus, perhaps https://github.com/kumina/postfix_exporter is a good pick? It's packaged in nixpkgs =)
    • @jfly chatted with @Mic92, and we're going to start with "blackbox" monitoring, which runs on pluto. Dumping some links from our discussion:
      • https://github.com/prometheus/blackbox_exporter/issues/913
      • https://github.com/prometheus/blackbox_exporter/blob/53e78c2b3535ecedfd072327885eeba2e9e51ea2/example.yml#L124
      • probe_ssl_earliest_cert_expiry
      • https://search.nixos.org/options?channel=24.05&show=services.prometheus.exporters.blackbox.enable&from=0&size=50&sort=relevance&type=packages&query=blackbox
      • http://build01.nix-community.org:9273/metrics
    • Anything else?
  2. Backups
    • Not necessary. This service is pretty much stateless (except for the mail stuck in queues, which we can live with?)

Alternatives considered

  • I don't know if there's been any serious discussion about paying someone (ImprovMX or something else) to handle this for us. Since declarative management and audit-ability are important to us, it would either have to be a provider that has a Terraform provider, or we could build one ourselves.
  • @Mic92, can you shed any light on this?

jfly avatar Sep 30 '24 20:09 jfly

I just want to make awareness that you probably need to write a mail to t-online and outlook (none 356) to whitelist your IP otherwise mails cannot be delivered.

SuperSandro2000 avatar Oct 01 '24 14:10 SuperSandro2000

After the leak of the existing email mappings I would be interested in discussing the privacy aspect of the email mappings. What other organization publishes those? The current set of addresses were not given to us by its recipients with the intent to make them public.

mweinelt avatar Oct 01 '24 14:10 mweinelt

I just want to make awareness that you probably need to write a mail to t-online and outlook (none 356) to whitelist your IP otherwise mails cannot be delivered.

I hear you on this. I've never run a mailserver before, and honestly have no idea what our deliverability is going to be like. I believe the current set of emails is quite tiny, and may not even include any t-online or outlook. My personal opinion on this is that we should make sure we've solved the monitoring story: if we get notified for email stuck in queues, then we can tackle these allowlists as necessary, or we can give up and pay someone to handle this for us.

After the leak of the email mappings I would be interested in discussing the privacy aspect of the email mappings.

Sorry about that. I asked one person about this, but should have talked to more people before posting.

Ideas:

  1. We could encrypt the email addresses. This would be hard to code review.
  2. We could seek consent from all the relevant people. I don't know how hard this would be. I don't have the list anymore, but it didn't seem like an insurmountable number.
  3. Do this behind some self-hosted (or paid) webapp with a login. That's basically what we do today with ImprovMX.

jfly avatar Oct 01 '24 15:10 jfly

I just want to make awareness that you probably need to write a mail to t-online and outlook (none 356) to whitelist your IP otherwise mails cannot be delivered.

For T-Online at least this is just one email after setting up reverse DNS and everything up correctly.

Overall I also don't expect the NixOS foundation to have to handle large volume of email. The vote was the first time, we had to do this actually.

Mic92 avatar Oct 02 '24 06:10 Mic92

  1. We could encrypt the email addresses. This would be hard to code review.
  2. We could seek consent from all the relevant people. I don't know how hard this would be. I don't have the list anymore, but it didn't seem like an insurmountable number.
  3. Do this behind some self-hosted (or paid) webapp with a login. That's basically what we do today with ImprovMX.

@zimbatm started to ask existing users of email addresses about that.

Mic92 avatar Oct 02 '24 06:10 Mic92

I hear you on this. I've never run a mailserver before, and honestly have no idea what our deliverability is going to be like. I believe the current set of emails is quite tiny, and may not even include any t-online or outlook. My personal opinion on this is that we should make sure we've solved the monitoring story: if we get notified for email stuck in queues, then we can tackle these allowlists as necessary, or we can give up and pay someone to handle this for us.

Some DMARC and reading the mail logs in case there are delivery problems. I didn't had any big issues with emails for the NixOS wiki and that looks more like bulk messages compared to what I expect to be sent from nixos.org.

Mic92 avatar Oct 02 '24 06:10 Mic92

@jfly Is it possible to move the email addresses into sops-encoded secrets, or is this part only configurable with plain Nix code?

zimbatm avatar Oct 02 '24 07:10 zimbatm

For T-Online at least this is just one email after setting up reverse DNS and everything up correctly.

And you need to have a proper imprint on the TLD of the rDNS entry and contact means via I think telephone and e-mail that is not going over the mail server.

I have recently done it and it took me a few back and forths but it is doable.

SuperSandro2000 avatar Oct 02 '24 13:10 SuperSandro2000

EDIT: After some discussion, we decided to give people the option of encrypting their email addresses when adding themselves to a mailing list. See https://github.com/NixOS/infra/pull/495#issuecomment-2445053476 and the refinement to it here.

@jfly Is it possible to move the email addresses into sops-encoded secrets, or is this part only configurable with plain Nix code?

It currently requires plain Nix code:

Adding support for encrypted emails seems like it might actually not be too hard:

  • We could adjust the nixpkgs service to allow for multiple virtual_alias_maps (currently it supports exactly 0 or 1), and then we could add a new entry to that array to point at a virtual alias map generated with a sops-nix template.
    • I think the nixpkgs change I have in mind will look weird. We might need a more generic solution that has a satisfying answer to this question: "why does virtual_alias_maps get this special escape hatch but not other maps like alias_maps?"
  • Adding a new entry is a little tricky because you actually need to run postmap to "compile" these mappings, but I think the existing services.postfix.mapFiles option is flexible enough to do this for us without changes.

tl;dr:

  • It's possible, but requires changes to nixpkgs, and perhaps SNM, depending on how we want to expose this.
  • I'm willing to do this work, but would prefer to wait until we know if it's necessary first.
    • If it makes sense to do this work, I could use a brainstorm partner on the nixpkgs change.

jfly avatar Oct 02 '24 15:10 jfly

I'm now hitting https://gitlab.com/simple-nixos-mailserver/nixos-mailserver/-/issues/302, because my mail server at infinisil.com had a strict SPF policy (-all), which was not a problem with ImprovMX. I didn't receive any mails since the switch, so I only noticed this once somebody pointed it out to me (thanks @ryantrinkle). For now I updated by SPF records to be less strict (~all), and explicitly add a:umbriel.nixos.org to the allow list, which should hopefully fix the issue, but I really don't think that's a great solution, because others might also be affected but not know about it.

infinisil avatar Apr 08 '25 01:04 infinisil

Sorry, I'm not quite following this. There are 2 ways that https://gitlab.com/simple-nixos-mailserver/nixos-mailserver/-/issues/302 could affect us:

  1. Someone uses a @nixos.org address to send mail to a mailing list. We addressed that by loosening our SPF record (as you said you've already done with your mailserver at infinisil.com.
  2. Someone signs up a @nixos.org address for some other mailing list. That mailing list which forwards emails that fail SPF but pass DKIM (and therefore pass DMARC). Our mailserver would (IMO incorrectly) drop those. https://gitlab.com/simple-nixos-mailserver/nixos-mailserver/-/issues/301 is a feature request to SNM to accept these instead.

I didn't receive any mails since the switch

Which emails haven't you received, and why? I don't see how changing your personal mailserver's SPF policy would have any affect on this.

jfly avatar Apr 08 '25 01:04 jfly

That all said, out of an abundance of caution, I'd like to roll back until we understand what's going on: https://github.com/NixOS/infra/pull/621

jfly avatar Apr 08 '25 01:04 jfly

Thanks for the quick offer!

Here's the message Ryan Trinkle received when CCing [email protected]:

This is the mail system at host umbriel.nixos.org.

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please file an issue at
https://github.com/NixOS/infra/issues/new. Please anonymize any personal
email addresses in your report.

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

The mail system

<[email protected]> (expanded from <[email protected]>): host
mail.infinisil.com[206.81.23.189] said: 550 5.7.23 <TOADDRESS>:
Recipient address rejected: Message rejected due to: SPF fail - not
authorized. Please see
http://www.openspf.net/Why?s=mfrom;id=FROMADDRESS;ip=37.27.20.162;r=infinisil.com
(in reply to RCPT TO command)

Where FROMADDRESS is Ryan Trinkle's personal email address. I can also see this having happened at least once more for a [email protected]-forwarded email.

infinisil avatar Apr 08 '25 03:04 infinisil

which should hopefully fix the issue, but I really don't think that's a great solution, because others might also be affected but not know about it.

We had the same problem over at c3d2.de and I am afraid that is the only solution that I have personally found.


@infinisil are you trying to send a mail via umbriel.nixos.org that has @infinisil.com in the from? Without the right configuration SPF prevents that (which is good and correct) and IMO sending mails from other mail servers is anyway something that is a bit sketchy.

I think SPF rewriting could be configured to fix this and the mailing list might be lacking that.

SuperSandro2000 avatar Apr 08 '25 09:04 SuperSandro2000

Me and @jfly sat together and looked into this a bit more closely. Conclusions:

  • ImprovMX didn't have this issue because it used SRS. @jfly is looking into doing the same with our own mail server before giving it another try. We found this to be the best description on how to do that.
  • Both me and @ryantrinkle's mail servers have the same restrictive -all SPF policy, but it's up to the senders SPF record (that's what the S stands for!) to determine pass/fail. So if we wanted to workaround this issue by updating SPF records to be more lax, the sender (which in this case was @ryantrinkle) has to do that, not the receiver (which in this case was me). Since the infra team has now temporarily rolled back to ImprovMX until SRS is configured, we won't need that, but good to know.

infinisil avatar Apr 08 '25 19:04 infinisil

I played around with SRS, and this does look pretty straightforward to do. My progress so far:

  1. I noticed that nixpkgs has a pretty old version of postsrsd. postsrsd 2 has some breaking changes, so I'd rather develop against that version than have to deal with those breaking changes in the future. https://github.com/NixOS/nixpkgs/pull/397316

  2. I've implemented this with my personal mailserver, it was quite straightforward: https://github.com/jfly/snow/commit/ec179dccb83291022ba0aba906b931d2de691792. This has the desired effect: I see that emails forwarded onto another domain now pass SPF.

    Before

    Image

    Image

    After

    Image

    Image

  3. I've sent in a PR to implement SRS on nixos.org here: https://github.com/NixOS/infra/pull/622

  4. ~I've read through https://support.google.com/mail/answer/175365, which mentions ARC as another thing we should implement. I see in our logs that ImprovMX does implement it, but we don't have it configured on our server. Some light research led me to this reddit comment. tl;dr: it might be a pain to implement/maintain (OpenARC doesn't seem to be maintained), and it's not clear that it would really do anything for our deliverability, as ARC seems to rely upon manually configured trust.~ Please disregard, we do get ARC with simple-nixos-mailserver.

jfly avatar Apr 09 '25 10:04 jfly

Rspamd does implement ARC. Was this not also used by simple-mail server?

Mic92 avatar Apr 09 '25 13:04 Mic92

Rspamd does implement ARC. Was this not also used by simple-mail server?

Oops. You're totally right. I see ARC headers in emails forwarded by umbriel. Please disregard.

jfly avatar Apr 10 '25 02:04 jfly

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/simple-nixos-mailserver-message-rejected-due-to-spf-fail-not-authorized/38067/16

nixos-discourse avatar Apr 10 '25 02:04 nixos-discourse

Rspamd does implement ARC. Was this not also used by simple-mail server?

Oops. You're totally right. I see ARC headers in emails forwarded by umbriel. Please disregard.

Can't say I do. My rspamd even classifies your recent mails with ARC_NA.

mweinelt avatar Apr 10 '25 02:04 mweinelt

This issue is getting too large. I've filed https://github.com/NixOS/infra/issues/631 to investigate ARC.

jfly avatar Apr 10 '25 22:04 jfly

I'm closing this. The new mailserver has launched (and hopefully will stay launched).

We still have to clean up ImprovMX, which is tracked by https://github.com/NixOS/infra/issues/587.

jfly avatar Apr 10 '25 22:04 jfly