aiosmtpd
aiosmtpd copied to clipboard
Unable to accept long lines
In a production environment, large institutional customers expect to be able to inject DATA with very long lines, even megabytes. They don't accept the requirement to comply with RFC5321 4.5.3.1.6, arguing that "other providers accept long lines without issue".
This creates a problem with use of aiosmtpd, because one needs to either instantiate SMTP with a massive maximum line length (leading to a huge preallocated buffer size), or face the issue of asyncio.LimitOverrunError, leading to failure to accept customer mails.
The obvious problem with allowing several megabytes for maximum line length is when an SMTP server has hundreds or even thousands of simultaneous inbound connections, the memory overhead of these buffers will add up to gigabytes.
I'm using a forked aiosmtpd which allows arbitrarily large input lines, by catching asyncio.LimitOverrunError and re-reading chunk by chunk, then when the next .readuntil() call completes without exception, assembles the chunks together.
My question is - would you be willing to consider a pull request if I commit these enhancements into my fork? I can assure full backward compatibility. I would propose adding an extra constructor keyword such as 'long_lines=False', and if set to True instead, the long line chunk collection would take effect.
I'm mixed. On the one hand, I like the idea of making aiosmtpd more powerful and that would be a good thing, OTOH violating RFCs annoys me tremendously. In fact I would be angry about having to support this in any environment. Wrapping lines is trivial on the senders side.
It's actually more difficult to send large chunks of data.
If course, I personally had some needs to violate RFFs on my own, which is why I just built my own program on top of aiosmtpd with my own handlers, etc, to handle the violations as well as features that I needed.
Writing all this out, I think what I would say is that I'm disinclined to accept the entire chunk of functionality that you're proposing, but if there's something that would not break RFCs, but would make supporting this kind of functionality easier, I'd accept that. Though the approach of reading/catching/re-reading sounds distasteful to me 🤔
Hi Wayne, thanks for your reply. I agree it's less than ideal to read/catch/re-read, but I needed to implement a quick remedy on a live server carrying a complaining customer.
As a compromise, how would you feel about an SMTP constructor keyword 'stream_reader_class', allowing the user to provide a subclass of the default asyncio.StreamReader? The SMTP._handle_client() method already goes against recommended usage by directly instantiating StreamReader, so I don't think providing support for a user-supplied subclass would go against the grain.
The fundamental problem is that the standard library method StreamReader.readuntil(sep='\n') doesn't allow an extra argument for maximum number of characters.
In its current implementation, the only way to read arbitrarily long chunks up to a separator is to either do .readuntil()/catch/.read() cycles, or add a buffering layer on top, and just use .read(n), and hunt for the separator oneself, and deal with the edge cases where a received chunk has a partial separator at end, or separator right at the beginning.
Life would be simpler if there was a StreamReader method combining the utility of .read() and .readuntil().
This has wider ramifications than just SMTP session handling. I consider this a severe issue, and will likely file a PEP to the core Python team for consideration.
Keen for your thoughts.
Cheers David
I'll have to refresh my mind with the code in question, but that sounds like a suitable compromise.
Your thoughts about the utility of having a maxchars or maxlen argument for readuntil is also interesting, though orthogonal to this particular issue.
Can I upvote to this? I receve ARFs FBL from Microsoft Junk Mail Reporting Program for processing and for obvious reasons they will not change the formatting of their emails because of this issue.
The proposed PR could be accepted?
Thanks very much.
Is there a good python replacement for this library due to its inability to accept long data?