mailparse base64 decoding should not be "strict"

base64 decoding should not be "strict"

Open andir opened this issue 2 years ago • 4 comments

I've encountered yet another mail where the base64 decoding using mailparse (with and without my recent change in #95) fails.

The reduced sample looks like this:

PC9odG1sPn==

If you decode that string with a non-strict (as in not strictly requiring "normalized" base64) it will result in:

</html>

If you reencode that string with Python/Ruby/base64 on the CLI you'll get

PC9odG1sPg==

Which then decodes properly with the mailparse crate.

The way I am currently working around this (with the data_encoding crate) is by definining my own BASE64 decoder:

lazy_static! {
    static ref BASE64_DECODER : data_encoding::Encoding = {
        let mut spec = data_encoding::BASE64_MIME.specification();
        spec.check_trailing_bits = false; // <- the important bit
        spec.encoding().expect("The encoding must be valid")
    };
}

I've come to believe that parsing mail with "strict" base64 parsers is just not a good idea. It might work in an ideal world but sadly I've received tons of mails with edge cases over the years :(

My ask for this issue is that we should probably switch to a non-strict decoder for mails. This is perhaps something that is better suited as part of the data_encoding library instead?

Dec 07 '21 15:12 andir

mailparse mailparse copied to clipboard

base64 decoding should not be "strict"

mailparse
mailparse copied to clipboard