wildduck
wildduck copied to clipboard
[Feature Guidance] Use Wild Duck as Frontend to AWS SES Incoming
First of, I want to call out that I am really impressed by this product and it will most likely very soon become my go-to mail deployment.
For the last couple of months, I have been investigating options on how to create a cheap and simple option to provide Email Access to the combination of AWS SES and AWS SES Incoming Mail. Using AWS SES for outgoing mail through something like Roundcube
is very simple since it provides SMTP endpoints and authentication. However, the incoming portion is no that simple.
SES Incoming allows you to create rules for incoming mail that will trigger when a new message is received and then have it stored in AWS S3 as "MIME" compliant messages. It's in fact, very similar to how you store messages in MongoDB right now.
I have taken a cursory glance at your code and available tools, and it is possible to create a bridge function that will pick up my new message from SES and move it into WildDuck, where it will reside in MongoDB. This is fine but it becomes costly in additional traffic, storage and compute to have to move it over every time.
I'd like your advice on what the best approach would be to have WildDuck be "aware" of the messages in S3. I have seen a couple of issues mentioning approaches like pluggable storage ( #33 ) and some Pull Requests trying to implement it ( #337 ) . That's a start to the path for sure.
Would there be the opportunity to support a use case where WildDuck accepts an event with something like { "mailbox": "xyz@sdsd", "messageId": "ses-msg-id-in-s3", "meta":{} }
which updates the IMAP Index but leaves the message in S3 and accesses it from there?
I am willing to code a lot of this myself, I'm really just looking for guidance on where the starting points will be.
Additional information on SES Incoming
https://docs.aws.amazon.com/ses/latest/dg/receiving-email-notifications-contents.html https://docs.aws.amazon.com/ses/latest/dg/receiving-email-action-s3.html
Short answer, no, WildDuck would not work well on such a use case where you keep the message source in S3 and do not download it for indexing.
WildDuck is mostly optimised for storage. This means that WildDuck does not even keep the original message around. Instead it parses it into a mime-tree document and removes all attachments from the message to store these in a separate deduplicated storage (each attachment is stored only once and is linked with all the messages that include it). When a message source is requested then WIldDuck does the reverse by retrieving attachments and recompiling the message based on the stored tree document. Combined with MongoDB data file compression it gives more than 50% of reduced disk usage and also gives the opportunity to use different kinds of disks (eg. to keep message info and indexes on an expensive and small but fast SSD and all the attachments on a large and cheap SATA).
The proposed pluggable storage options have been for the attachments storage, not for the message storage. Being able to use S3 as the attachment storage has been one of the goals from the start but it has not been a priority so I haven't actually worked on it. And even if it would be already implemented it still would not fit your use case as it would not use the stored rfc822 message files but would upload attachments as regular binary files (upside is that you can then link to these files from other applications as these would be regular images/documents/etc, not mime-encoded files).
It would be probably possible to fork WildDuck and rebuild everything storage related to use a different system. The exact system would mostly depend on the end goal. Do you just want to use it as a transfer system where IMAP clients would be able to connect to the server and download messages or do you want to use it as a regular email server with, for example desktop Outlook.
The first option would be fairly easy, you could use the metadata received from S3 for most listing operations and fetch and parse actual messages on spot only whenever something more complicated is requested. Basically glorified POP3.
That simple approach would not work with "normal" email clients though as it would end up to be super slow, at least if you have more than a few emails in a folder. To be able to use this system as a regular email server you would still have to fetch the entire message source from S3 whenever a message is added, build an index based on the message file and store the resulting metadata entry somewhere. For most queries you'd use the stored metadata and when the client then asks for message sources or attachments then you could perform ranged HTTP requests against S3 (I mean using the Range: bytes=0-NN
header – you would know the ranges because the indexing process should store these, eg. that attachment 1 starts from byte X and ends at byte Y) to download requested parts. In any case this is too different from the current WildDuck storage system to add it as an additional storage option.
If you do want to use WIldDuck as it is then you can add and index messages via WildDuck API. For example if the user ID for an user is 5e2a9b67ab7ea4a226529417 and the ID for the INBOX folder of that user is 5e2a9b67ab7ea4a226529418 then you can upload new messages to the account like this (the following example uses curl
to fetch a message source from an URL and then posts this to the WildDuck message add endpoint).
curl -s "https://raw.githubusercontent.com/nodemailer/mailparser/master/test/fixtures/nodemailer.eml" | curl -XPOST "http://localhost:8080/users/5e2a9b67ab7ea4a226529417/mailboxes/5e2a9b67ab7ea4a226529418/messages" -H "Content-Type:message/rfc822" --data-binary @-
If everything succeeds you should get stored message ID as the response
{
"success": true,
"message": {
"id": 545,
"mailbox": "5e2a9b67ab7ea4a226529418"
}
}
And to verify if the message was in fact stored as inteded:
$ curl "http://localhost:8080/users/5e2a9b67ab7ea4a226529417/mailboxes/5e2a9b67ab7ea4a226529418/messages/545/message.eml"
Delivered-To: [email protected]
Received: by 10.28.50.2 with SMTP id y2csp233403wmy;
Thu, 13 Oct 2016 04:39:49 -0700 (PDT)
X-Received: by 10.25.37.18 with SMTP id l18mr9511740lfl.88.1476358789184;
Thu, 13 Oct 2016 04:39:49 -0700 (PDT)
....
For sending you can use SES as SMTP gateway and assuming you installed WildDuck with ZoneMTA (as the bundled installer installs it), edit /etc/zone-mta/zones/default.toml so it would look like this
[default]
preferIPv6=false
ignoreIPv6=true
processes=1
connections=5
pool="default"
host = "email-smtp.us-east-1.amazonaws.com"
port = 465
secure = true
[default.auth]
user = "SMTP-username"
pass = "SMTP-password"
I have been investigating options on how to create a cheap and simple option
I do also wonder if this is all actually simpler than running haraka and mongodb haha. Or even cheaper at scale. Doing things like this also locks you into a specific vendor. But I suppose this is a seperate discussion. It's interesting to think and talk about nonetheless.
Thanks for the very detailed response. Really appreciate it! I'll have to take some time later on to read through it in detail.
With relation to @louis-lau's question. Yeah, the lock-in is for sure there, but not a concern for me at all. The cost effectiveness on the other hand is very much in my proposed architectures' favor. SES has a lifetime free tier and S3 storage at scale is extremely cheap. Depending on the resource requirements for running the "bridge", the ECS options in AWS is very, very competitively priced.
This issue is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 15 days.
This issue was closed because it has been stalled for 15 days with no activity.