dma icon indicating copy to clipboard operation
dma copied to clipboard

Implement header "From" rewriting

Open jjakob opened this issue 4 years ago • 15 comments

This adds header From rewriting when enabled in dma.conf with REWRITEFROM. The original From header is added as "X-Original-From". The maximum length of the From header is limited to 10 lines as defined by MSG_HDR_FROM_MAX due to static variable allocation.

The rewrite function fails if the address part of the From header is not in a single line as it uses a simple strncmp. This could be improved by refactoring parse_addrs() into a more generic address parsing function that returns pointers to the start and end of a valid address in the passed string, so the rewrite function could just replace the string between those 2 pointers (inserting line continuations before and after the address if the address happened to be split into multiple lines). Nevertheless, the current simplified rewriting function will fail gracefully in this case.

If the rewrite fails to find the From address in the header line, the header is not rewritten, but X-Original-From is still added. This can be changed via a later commit to on failure replace it with the envelope-from address and discard the original From line. This will ensure rewriting always takes place, but will not preserve the From address comment sent by the client on failure to rewrite.

jjakob avatar Apr 12 '21 16:04 jjakob

Added a commit that addresses this issue:

If the rewrite fails to find the From address in the header line, the header is not rewritten, but X-Original-From is still added. This can be changed via a later commit to on failure replace it with the envelope-from address and discard the original From line. This will ensure rewriting always takes place, but will not preserve the From address comment sent by the client on failure to rewrite.

This commit will make it replace the From with envelope-from even when the rewrite fails.

jjakob avatar Apr 14 '21 15:04 jjakob

Why do we have to store the headers in the queue structure?

corecode avatar Apr 16 '21 14:04 corecode

You're right, it's a global variable so it can be accessed from any function.

I was looking at issues #16 and #21, it seems like what's needed is a more generalized rewrite function. rewrite_header_from could easily be rewritten to work on any header. The bigger issue I see is that by using static variable allocation like I used here, the from_lines variable needs to be initialized to the maximum possible from lines size, which uses quite some memory (10k bytes from_lines + 10k bytes rewritten_from_lines = 20k bytes). That would increase linearly as more variables are needed for To, Cc, Bcc. The better way would be to use dynamic allocation, but would require rewriting most of the rewriting code as it is in this PR.

jjakob avatar Apr 17 '21 10:04 jjakob

I mean, can we just rewrite the header on the fly, without storing it?

On 17/04/2021 05:37, Jernej Jakob wrote:

You're right, it's a global variable so it can be accessed from any function.

I was looking at issues #16 https://github.com/corecode/dma/issues/16 and #21 https://github.com/corecode/dma/issues/21, it seems like what's needed is a more generalized rewrite function. rewrite_header_from could easily be rewritten to work on any header. The bigger issue I see is that by using static variable allocation like I used here, the from_lines variable needs to be initialized to the maximum possible from lines size, which uses quite some memory (10k bytes from_lines + 10k bytes rewritten_from_lines = 20k bytes). That would increase linearly as more variables are needed for To, Cc, Bcc. The better way would be to use dynamic allocation, but would require rewriting most of the rewriting code as it is in this PR.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/corecode/dma/pull/96#issuecomment-821802967, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABLOOYU55AI6GTTXA7BCEDTJFQEZANCNFSM42ZTYL2Q.

corecode avatar Apr 17 '21 14:04 corecode

It might be possible to rewrite it on the fly, I'd need to try implementing it. It needs to be stored at least to add the X-Original-From header, which is good to have if there's a bug in the rewrite code, we can see what the original header was.

jjakob avatar Apr 17 '21 22:04 jjakob

I guess I don't understand.  In my mind, we see the From: line, then insert a X-Original-From: (maybe with newline continuation), then the contents of the From line, and all following continuation lines.  Is there something we need to read first completely, before we can issue the new From?

On 17/04/2021 17:17, Jernej Jakob wrote:

It might be possible to rewrite it on the fly, I'd need to try implementing it. It needs to be stored at least to add the X-Original-From header, which is good to have if there's a bug in the rewrite code, we can see what the original header was.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/corecode/dma/pull/96#issuecomment-821895245, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABLOO5XRJEVWRRKZPRAQE3TJICGNANCNFSM42ZTYL2Q.

corecode avatar Apr 17 '21 22:04 corecode

A header can be split into multiple lines by inserting a blank character at the beginning of the next line, for example:

From: "foobar"
 <foo@bar>

The main reason being for cases where a header would exceed the maximum line length, which is 998 characters according to https://tools.ietf.org/html/rfc2822#section-2.2.3 (this is called "folding"). So we can't simply take one line in, rewrite it, and dump 2 lines out. The current parse_addrs function correctly parses the address out of a multi-line header so I just wrote the rewriting around it, still using it to extract the address out of the header. Parsing the raw message by myself is not something I'd be comfortable doing as it needs to account for many different things that the standard allows so it can get very complex and error-prone. The parse_addrs function is an example. So in order to possibly do a on-the-fly rewrite, one could watch parse_state to see when the header line is the one containing the address, then either pass it through (a line without an address) or rewrite it (a line where an address has been found). I'd still save the original header as-is to then later add X-Original-From (which can also be multi-line, as it can be in my implementation - up to 10k chars) The other possibility being a header where the address itself is split into 2 lines, we need to account for that too. The current rewrite function fails in this case. A possible workaround would be, in the on-the-fly rewrite possibility, to watch parse_state, see when it detects the start of an address, strip it out from that line, insert "\n<rewritten_addr>\n", then on the next line, strip the ending part of the address out of it, and pass the rest unchanged. Of course, a proper rewrite function would take the incoming header, strip all newlines and folding from it, rewrite it, then re-fold it before the 998-char limit. Without doing that, any rewriting has the possibility of exceeding the 998-char limit if for example, the original address was much shorter than the rewritten one. We basically rely on the incoming header being that much shorter than the maximum, or we'll produce non-compliant messages. Whether modern mail software can handle them is another question, they possibly may.

jjakob avatar Apr 17 '21 22:04 jjakob

To clarify the last paragraph - it would be possible to not exceed the 998-char limit even with rewriting as we do it here, just by always folding before the rewritten address. For example:

From: "someone"
 <foo@bar>
X-Original-From:
 "someone" <foo@bar>

(imagine that the original header is right at 998 chars, we need to fold both the rewritten address and X-Original-From into their own lines, as "X-Original-" is 11 additional characters that would make the line exceed the limit)

jjakob avatar Apr 17 '21 22:04 jjakob

And another example:

From: "this
 is
 a
 totally
 legit
 address" <wow
 @what
 .a
 .mess>

Could be rewritten to:

From: "this
 is
 a
 totally
 legit
 address"
 <newaddress>

by passing the non-address lines through unchanged, stripping <wow from the line with the partial-address (where the < may or may not be there), inserting our address as \n<newaddress>\n, then stripping everything the parse_addrs function still considers to be an address, that means everything else, including a possible > right after the address.

jjakob avatar Apr 17 '21 23:04 jjakob

I still don't get the complication. When looking at the From: header, I would insert a new line, From: $envelope_from, and then insert a X-Original-From:\n\t, and then insert the original line, plus continuation lines. Why do we need to retain all kinds of stuff of the original from line if we rewrite it?

corecode avatar Apr 17 '21 23:04 corecode

Because the point of rewriting is to rewrite just the address part of the headers, not the comments. The comment can be anything the sender wants, and it would be bad to strip it out, as it can have the sender's name, the application's name in it, etc.

jjakob avatar Apr 17 '21 23:04 jjakob

If we replaced From: "My Name" <myname> with From: <server@domain> we would remove that critical piece of information, which is the sender's name. MUA's don't display anything from X-Original-From. It also wouldn't technically be considered "rewriting", but "replacing", so it could only be considered "header replacing".

jjakob avatar Apr 17 '21 23:04 jjakob

#21 would also be impossible to implement without actually storing the whole header and rewriting just the address part. Okay, possible just by blindly discarding everything that isn't the address itself, but that would be very rude, and hardly satisfactory IMO.

jjakob avatar Apr 17 '21 23:04 jjakob

Doing the whole rewriting is too complicated and will likely introduce bugs. I think we either replace the From: address, or not do it at all.

corecode avatar Apr 17 '21 23:04 corecode

I'll come up with a working on-the-fly rewrite branch in any case when I get to it. Parsing the address from the header (with all the weirdness aroud folding etc.) is already in the code in parse_addrs, so there's not much possibility of going wrong there, and I won't be touching it. So this branch can stay as a WIP/PoC (or someone can already use it on his systems as I am).

jjakob avatar Apr 17 '21 23:04 jjakob

If you ever get around to implementing this, consider also applying the same treatment to the "To" header (I'm currently seeing rejected mails because mails are generated with "To: root" which dma passes without modification and which my exim rejects.

matthijskooijman avatar Mar 21 '24 08:03 matthijskooijman