dma
dma copied to clipboard
Handle lines longer than 1000 bytes: error "bad mail input format"
Currently dma chokes on mail input which contains lines longer than 1000 bytes, because of the buffer being used. When it does not find a newline at the end, it adds one, but then assumes/requires this to be the last line.
The error is "bad mail input format".
This might happen when using dma to handle web application errors during development / local setup, and the included debug information contains huge environment etc settings (e.g. "LS_COLORS").
I think that it would be saner to not require RFC2822 compliance, but accept the mail nonetheless.
The code in question: https://github.com/corecode/dma/blob/master/mail.c#L376
Do you have a suggestion how to deal with longer lines? Something that does not complicate the code?
On 01/11/2014 07:00 AM, Daniel Hahler wrote:
Currently dma chokes on mail input which contains lines longer than 1000 bytes, because of the buffer being used. When it does not find a newline at the end, it adds one, but then assumes/requires this to be the last line.
The error is "bad mail input format".
This might happen when using dma to handle web application errors during development / local setup, and the included debug information contains huge environment etc settings (e.g. "LS_COLORS").
I think that it would be saner to not require RFC2822 compliance, but accept the mail nonetheless.
The code in question: https://github.com/corecode/dma/blob/master/mail.c#L376
— Reply to this email directly or view it on GitHub https://github.com/corecode/dma/issues/18.
The easiest approach would probably be to increase the buffer length, from 1000 bytes to maybe 64kB (65536). It would/could then still fail in the same way, but it's much more unlikely.
Or, just do not add a newline (and effectively aborting / cutting the input) and keep reading in 1000 byte chunks.
The RFC is absolutely clear on this issue:
Each line of characters MUST be no more than 998 characters
and However, there are so many implementations that (in compliance with the transport requirements of [RFC5321 http://tools.ietf.org/search/rfc5321]) do not accept messages containing more than 1000 characters including the CR and LF per line, it is important for implementations not to create such messages.
It is clear that we can not forward any message with more than 1000 characters per line. What do you suggest as alternative? Insert a linebreak after 998 characters?
On 01/11/2014 11:15 PM, Daniel Hahler wrote:
The easiest approach would probably be to increase the buffer length, from 1000 bytes to maybe 64kB (65536). It would/could then still fail in the same way, but it's much more unlikely.
Or, just do not add a newline (and effectively aborting / cutting the input) and keep reading in 1000 byte chunks.
— Reply to this email directly or view it on GitHub https://github.com/corecode/dma/issues/18#issuecomment-32108845.
Well, I think that dma is not creating the message, but only transports it.
The bug is really with the app that hands it over to dma.
Inserting newlines can be very problematic, if the message is meant to contain (encoded) data, and the result may fail to parse afterwards.
Imagine an automated message, with "key=value" pairs on separate lines, where it may result in "key=verylongvalue\r\nstillvalue".
However, breaking lines after 998 bytes is still far better than rejecting the mail altogether.
I can imagine having a configuration option (in dma.conf) for this: RFC2822_SPLITLINES. If set, CRLF gets inserted after 998 bytes, otherwise the lines would get bypassed unchanged.
It is not optional to forward messages with more than 998 characters per line: it is forbidden. Do you know how other MTAs handle this? I only see two options: break line, or reject (bounce).
On 01/12/2014 06:01 PM, Daniel Hahler wrote:
Well, I think that dma is not creating the message, but only transports it.
The bug is really with the app that hands it over to dma.
Inserting newlines can be very problematic, if the message is meant to contain (encoded) data, and the result may fail to parse afterwards.
Imagine an automated message, with "key=value" pairs on separate lines, where it may result in "key=verylongvalue\r\nstillvalue".
However, breaking lines after 998 bytes is still far better than rejecting the mail altogether.
I can imagine having a configuration option (in dma.conf) for this: RFC2822_SPLITLINES. If set, CRLF gets inserted after 998 bytes, otherwise the lines would get bypassed unchanged.
— Reply to this email directly or view it on GitHub https://github.com/corecode/dma/issues/18#issuecomment-32127110.
Then breaking the line is better than rejecting the mail.
Postfix has a line_length_limit option, with a default of 2048 (http://www.postfix.org/postconf.5.html):
Upon input, long lines are chopped up into pieces of at most this length; upon delivery, long lines are reconstructed.
It sounds like this will effectively pass through long lines?!
@blueyed, IIRC that limit is for internal processing across Postfix apps and filters, for actual smtp this is the setting: http://www.postfix.org/postconf.5.html#smtp_line_length_limit
Good catch. Any suggestions how we should handle this?
On 03/26/2014 08:24 PM, Brian Coca wrote:
@blueyed https://github.com/blueyed, IIRC that limit is for internal processing across Postfix apps and filters, for actual smtp this is the setting: http://www.postfix.org/postconf.5.html#smtp_line_length_limit
— Reply to this email directly or view it on GitHub https://github.com/corecode/dma/issues/18#issuecomment-38728119.
I think rejection is OK, but maybe add more info to the message, 'bad mail input format' (I just ran into this) is a bit too succinct, knowing that its a line length error helps pinpoint the issue much faster. Also, quoting RFC helps convince programmers change their code vs demand that the MTA just accept their emails.
alternatively we could force a linebreak when accepting the mail, like postfix does when sending?
On 03/26/2014 08:47 PM, Brian Coca wrote:
I think rejection is OK, but maybe add more info the the message, 'bad mail input format' (I just ran into this) is a bit too succinct, knowing that its a line length error helps pinpoint the issue much faster. Also, quoting RFC helps convince programmers change their code vs demand that the MTA just accept their emails.
— Reply to this email directly or view it on GitHub https://github.com/corecode/dma/issues/18#issuecomment-38730685.
The Postfix approach seems sensible. To me everything is better than rejecting the mail.
quoting RFC helps convince programmers change their code
I agree. There could be a warning instead of the current error.
Given the use case of dma (for personal or lightweight use in containers) I do not think it should be too strict (i.e. rejecting the mail).
I disagree, I use dma because it it simple and lightweight and doesn't try to do complex stuff like line splitting with character or armor encodings. If it accepts mail and it then gets rejected by another mailer, it creates more issues than it solves.
I am a fan of fail early and fail loudly, if you start accepting emails and push the problem downstream you never have an incentive to fix the actual issue, the program composing non compliant smtp messages.
This is one issue that prevented the FreeBSD clusteradm team switching to DMA internally: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208261
in postfix you CAN disable the validation, does not mean it doesn't do it, postfix has many options to 'tolerate' other broken implementations
@emaste what do you suggest dma should do? Just break lines?
@corecode I want to see the FreeBSD commit mail generator (I think it was the problem) fixed to not generate lines > 998 chars, and I think I agree that by default dma shouldn't accept such mail.
What do you think about having a config file option to set a (non-compliant) maximum line length, and/or one to enable line splitting?
I'm fine with not rejecting mails by default - I just don't know what we should do. Split the line? Cut it? Pass it and be non-compliant?
I don't like just cutting it (silently losing information in an otherwise passed email). Given the choice of passing non-compliant lines and splitting I'd choose the latter.
That leaves us with the question of doing it by default or only under a config file option. If I'm doing the work I wouldn't bother with the config option, but wouldn't object to one to appeal to the "fail early and fail loudly" by default mentioned by @bcoca.
I don't think we can easily add to the headers during reading. I'm fine with making the default to split lines. We'll need a strategy for splitting possibly overlong header lines tho, so that they stay being headers. Can we just split words in the middle, or do we have to split on word boundaries?
For me I consider the line splitting to be a "best effort" to deliver malformated mail, and I'd be happy enough splitting at whatever point the we reach the line length limit. I'd also be fine rejecting overlong headers and splitting only the body, if that's easier.
I believe the fundamental problem is context; currently there is none. Since 'dma' is, by design, a local MTA (i.e. non-listening) we should readily be able to distinguish from mail we are generating (i.e. out-going mail) and mail we are consuming (i.e. incoming for local delivery); since we are not relaying. I believe the existing behaviour is correct for out-going mail. For incoming mail we should be tolerant and pass the problem to the mail client. I do not believe that mangling email is a valid option as upstream mail signatures will then be incorrect.
To clarify, my own use case is using 'dma' as an out-going MTA and an incoming MDA (through fetchmail).
... had quick look at the code in relation to the above ... I think the initial proposal is correct in that we need to increase the 'readmail' buffer to a maximum (say 64k) and then add a check which will flag the mail as malformed if any line is over 1000 bytes (possibly through a generic 'int qitem.valid' which can map to various validity checks). We can then defer the actual decision until we actually 'deliver' the mail.
PS: you'll probably need a '#define RFC822_MAX_LINE 1000' and a '#define DMA_MAX_LINE' as is see those 'char[1000]' limits in a couple more places. PPS: I can create a patch if the proposal is acceptable.
It's the outgoing mail side of this issue that affected the FreeBSD commit mail generator. It is true that the script should not generate long lines, but it's handled by the current MTA.
@emaste : I can only comment personally to that; firstly, I agree, "the script should not generate long lines". Secondly, we should not emit invalid output (i.e. lines over the limit). Thirdly, splitting content lines presumes a great deal about the content and, whilst it may not impact the FreeBSD output it may malform other content; reducing that problem is a sisyphean task.
In short (and IMHO), FreeBSD is correct it's assessment that 'dma' is an unsuitable mailer for its toolchain (because it produces illegal output).
Pull request is my best understanding of sane behaviour. It was actually a little simpler than I imagined since we queue the mail anyway; by defining different buffer lengths and only using the RFC length for remote delivery we can bump the error.
That said, upon reviewing the RFC I'm going to have to disagree with @corecode; the RFC does not prohibit lines longer than 1000 characters. IMHO it prohibits the production of messages which have lines longer than 1000 characters (e.g. the FreeBSD toolchain example we're discussing), however, it actually recommends that an MTA "handle an arbitrarily large number of characters". As such, there is probably a bit more work to do on the problem.
PS: I also forgot to patch the line termination so it actually has a valid CRLF, rather than an invalid LF terminator (semantic issue I know but I'll try and do that anyway).
NB: Don't think this is likely to be looked at before I get to this tomorrow but DMA_LINE_MAX is obviously wrong; I need to correct that to a literal 65536 in the github branch (just realized I hadn't push that back from development server when I rebuilt on another machine).
I don't like the idea of message content being altered - at least, not without warning.
Of course, allowing lines over 1000 characters might cause a message to be corrupted somewhere else down the line.
Surely the clean solution would be to convert to quoted-printable? though this may be a pedantic overkill solution, especially for locally delivered mail.
That would change the whole whole content, I don't think we should take that liberty.
Fair enough.
I know I've not been involved with this discussion or development, so I'm not arrogant enough to expect my opinion to matter, but for what it's worth, if I was doing this, I'd be uncomfortable in breaking the spec (as you also are), but I'd also be uncomfortable altering the body in (what to those outside DMA) is an arbitrary manner.
Other applications that do that have caused unexpected (and sometimes unnoticed) errors, as exampled here:
https://stackoverflow.com/questions/9999908/php-mail-function-randomly-adds-a-space-to-message-text
https://stackoverflow.com/questions/10401863/how-to-send-a-csv-attachment-with-lines-longer-than-990-characters
Just my 1 cents worth
Cheers, Jamie
How is DMA breaking the spec?
@bapt added a patch to the FreeBSD base system for this, in https://github.com/freebsd/freebsd-src/commit/b0b2d05fd0609d504236e5429cef421a35237bd6 (and a bugfix from me in https://github.com/freebsd/freebsd-src/commit/1a0dde338df8b493d74dcb2f7bbaaa6c02cab371). This is slightly different from the proposal in https://github.com/corecode/dma/pull/49, in that it does not attempt to split on whitespace. It does not do any splitting on headers.
IMO there are a few improvements that could be made:
- split headers as well
- split at whitespace if possible
- make long line splitting a configurable option (reject mail with long lines, if not enabled)
@emaste I think an error in getline will get undetected and return success. Otherwise looks like an acceptable way to address such mails.
How is DMA breaking the spec?
It's not. There were suggestions earlier in this thread saying you should just ignore the 1000 character limit. You disagreed, and I agreed with you.
@corecode getline returns -1 on error or EOF, I think the feof/getline loop is non-canonical but I think checking getline's return for >0 is the appropriate condition.
I think this should go on top of @bapt's change:
diff --git a/contrib/dma/mail.c b/contrib/dma/mail.c
index 9e00c22d71db..0462ec1665ee 100644
--- a/contrib/dma/mail.c
+++ b/contrib/dma/mail.c
@@ -405,10 +405,7 @@ readmail(struct queue *queue, int nodot, int recp_from_header)
if ((ssize_t)error < 0)
return (-1);
- while (!feof(stdin)) {
- newline[0] = '\0';
- if ((linelen = getline(&line, &linecap, stdin)) <= 0)
- break;
+ while ((linelen = getline(&line, &linecap, stdin)) > 0)
if (had_last_line)
errlogx(EX_DATAERR, "bad mail input format:"
" from %s (uid %d) (envelope-from %s)",
@corecode ah, prior to bapt's change we had:
if (fgets(line, sizeof(line) - 1, stdin) == NULL)
break;
so same amount of error checking.
I think we can just check errno after the loop,
diff --git a/contrib/dma/mail.c b/contrib/dma/mail.c
index 0462ec1665ee..e052d745398c 100644
--- a/contrib/dma/mail.c
+++ b/contrib/dma/mail.c
@@ -507,8 +507,8 @@ readmail(struct queue *queue, int nodot, int recp_from_header)
}
}
}
-
- ret = 0;
+ if (errno == 0)
+ ret = 0;
fail:
free(line);
return (ret);
I think we can keep the feof() loop condition, but just do something about error return from getline.
I think it's fine to use getline()'s -1 return
I have a FreeBSD review in https://reviews.freebsd.org/D34159