mairix icon indicating copy to clipboard operation
mairix copied to clipboard

Lots of headers that can't be parsed

Open Ndolam opened this issue 2 years ago • 6 comments

I just downloaded the .zip file and compiled mairix. When I run it (and this is the same as V0.24), I get many complaints about headers that can't be parsed. For example:

Header 'content-type: image/*; name="20221017_130844_resized.jpg"' in [89420989,90670144) could not be parsed

I'm not a mail wizard, but that looks OK to me.

A more lengthy example:

Header 'content-disposition: inline; filename="image004.png"; size=79197; creation-date=Fri, 06 May 2022 16:51:48 GMT; modification-date=Fri, 06 May 2022 20:09:01 GMT' in [28093802,28267769) could not be parsed

Q1: Is it just me, or is this happening to other people?

Q2: Are these complaints valid, or are they spurious?

Thanks.

Ndolam avatar Feb 28 '23 20:02 Ndolam

They happen to me. My "get mail" script, which calls mairix to index after getting new email, pipes mairix output through: egrep -v '(could not.*parse|Can.t (find|process).*boundary|mtime failed)'. According to git, I added that line to my script in June 2013.

It always seems to be on MIME headers, which I never want to search, so I just ignore the errors.

edgewood avatar Feb 28 '23 22:02 edgewood

Thanks for the response.
I guess I always get concerned about programs not handling (what I assume is) perfectly valid input. But your way of dealing with it has pragmatic appeal.

Ndolam avatar Feb 28 '23 22:02 Ndolam

I looked into trying to figure out what was going wrong, but the C code made my eyes cross, so it was adapt or switch, and I couldn't find alternatives that met my requirements.

edgewood avatar Mar 08 '23 20:03 edgewood

I decided to take a look at the code as well.

During my very quick look, I see that one of the headers it complains about contains Content-type: image/; name="..." and (for what it is worth) the answer in https://stackoverflow.com/questions/27790669/is-the-contenttype-image-valid claims that image/ is not valid. I suppose mairix is right to complain about this. (I tried changing the line to "... image/jpeg; ..." and mairix is happy with it.)

I'll try another example and see if anything else illuminating pops up.

Ndolam avatar Mar 08 '23 22:03 Ndolam

Another complaint mairix is making is because some mailers send out lines like creation-date=Thu, 09 Feb 2023 18:33:40 GMT and mairix wants "s around the date.

I took a very quick look but didn't find out whether the quotes are required or not. (In this case the entire header group is

Content-Type: image/jpeg;
	name="image002.jpg"
Content-Description: image002.jpg
Content-Disposition: inline;
	filename="image002.jpg";
	creation-date=Thu, 09 Feb 2023 18:33:40 GMT
Content-ID: <[email protected]>
Content-Transfer-Encoding: base64

and it occurs to me that the rules could be different for multi-line headers as oppose to single-line headers.

Anyone reading this know?

Ndolam avatar Mar 08 '23 22:03 Ndolam

In any case, to allow things like creation dates with unquoted strings I'd guess the NFA definition in nvp.nfa would have to be modified, and that might be a job best suited to either (a) the original NFA author, or (b) someone who loves playing with NFAs.

Ndolam avatar Mar 09 '23 01:03 Ndolam