mairix
mairix copied to clipboard
Lots of headers that can't be parsed
I just downloaded the .zip file and compiled mairix. When I run it (and this is the same as V0.24), I get many complaints about headers that can't be parsed. For example:
Header 'content-type: image/*; name="20221017_130844_resized.jpg"' in
I'm not a mail wizard, but that looks OK to me.
A more lengthy example:
Header 'content-disposition: inline; filename="image004.png"; size=79197; creation-date=Fri, 06 May 2022 16:51:48 GMT; modification-date=Fri, 06 May 2022 20:09:01 GMT' in
Q1: Is it just me, or is this happening to other people?
Q2: Are these complaints valid, or are they spurious?
Thanks.
They happen to me. My "get mail" script, which calls mairix to index after getting new email, pipes mairix output through:
egrep -v '(could not.*parse|Can.t (find|process).*boundary|mtime failed)'
. According to git, I added that line to my script in June 2013.
It always seems to be on MIME headers, which I never want to search, so I just ignore the errors.
Thanks for the response.
I guess I always get concerned about programs not handling (what I assume is) perfectly valid input.
But your way of dealing with it has pragmatic appeal.
I looked into trying to figure out what was going wrong, but the C code made my eyes cross, so it was adapt or switch, and I couldn't find alternatives that met my requirements.
I decided to take a look at the code as well.
During my very quick look, I see that one of the headers it complains about contains Content-type: image/; name="..." and (for what it is worth) the answer in https://stackoverflow.com/questions/27790669/is-the-contenttype-image-valid claims that image/ is not valid. I suppose mairix is right to complain about this. (I tried changing the line to "... image/jpeg; ..." and mairix is happy with it.)
I'll try another example and see if anything else illuminating pops up.
Another complaint mairix is making is because some mailers send out lines like
creation-date=Thu, 09 Feb 2023 18:33:40 GMT
and mairix wants "s around the date.
I took a very quick look but didn't find out whether the quotes are required or not. (In this case the entire header group is
Content-Type: image/jpeg;
name="image002.jpg"
Content-Description: image002.jpg
Content-Disposition: inline;
filename="image002.jpg";
creation-date=Thu, 09 Feb 2023 18:33:40 GMT
Content-ID: <[email protected]>
Content-Transfer-Encoding: base64
and it occurs to me that the rules could be different for multi-line headers as oppose to single-line headers.
Anyone reading this know?
In any case, to allow things like creation dates with unquoted strings I'd guess the NFA definition in nvp.nfa would have to be modified, and that might be a job best suited to either (a) the original NFA author, or (b) someone who loves playing with NFAs.