eml_parser
eml_parser copied to clipboard
Create new Python Bug to Header Parsing Issue
In test_headeremail2list_2
, it mentions Python bug 27257. However, Bug 27257 appears to be related to empty groups in the header, not issues with obsolete period. With Python 3.7, I do not have any issues with the decoded value, unless the eml_parser should include address groups.
https://github.com/GOVCERT-LU/eml_parser/blob/f98980a77d9c7d914d97525a62294075c0ce42d9/tests/test_emlparser.py#L131
From the bug:
To: unlisted-recipients: ;, ""@pop.kundenserver.de (no To-header on input) The current output below appears to be the expected output. 'to': ['@pop.kundenserver.de']
From the RFC:
To: A Group:Ed Jones [email protected],[email protected],John [email protected]; Again, the current output below appears to be the expected output. 'to': ['[email protected]', '[email protected]', '[email protected]']
I have not found a related issue in the Python bug tracker, but perhaps something like the following in _header_value_parser.py
would be appropriate to prevent the exception:
Thanks for your analysis. I agree that 27257 does not seem to be related. I unfortunately don't recall this exactly, but I probably meant another one instead.
Regarding the workaround, this is still necessary though, same on 3.7 as on 3.8. I just retested it with the problematic sample included in the samples folder of this repo.
Regarding your suggestion, _header_value_parser is private so I can't include that one. I haven't tested it but from looking at that function I don't think it would solve the issue I am trying to workaround "Test.[email protected]". Did you test this? Would you be interested in making a pull-request ?
With the modification to the Python 3.7 email._header_value_parser.py, the following is my output. This causes test_headeremail2list_2 to fail, as intended, because the default Python header parser succeeds.
I created pull request 18687 to address this issue. https://github.com/python/cpython/pull/18687
>>> msg_test = email.message_from_string("""From: John Doe.<[email protected]>
Test e-mail. with a https://www.google.com:5000?test
""", policy=email.policy.default)
>>> msg_from = msg_test.get_all('from')
>>> print(msg_from[0].addresses[0].display_name, msg_from[0].addresses[0].addr_spec)
John Doe. [email protected]
>>> print(json.dumps(eml_parser.eml_parser.parse_email(msg_test), indent=2, default=json_serial))
{
"body": [
{
"uri_hash": [
"ac6bb669e40e44a8d9f8f0c94dfc63734049dcf6219aac77f02edf94b9162c09"
],
"content_header": {},
"hash": "a46645c9d7598af7036fc173380b1bce4fe6a4e16313523e29e31cbee6eec6e2"
}
],
"header": {
"subject": "",
"from": "[email protected]",
"to": [],
"date": "1970-01-01T00:00:00+00:00",
"received": [],
"header": {
"from": [
"\"John Doe.\" <[email protected]>"
]
}
}
}
With the modification to the Python 3.7 email._header_value_parser.py, the following is my output. This causes test_headeremail2list_2 to fail, as intended, because the default Python header parser succeeds.
I created pull request 18687 to address this issue. python/cpython#18687
Great! Thank you!
This appears to be related to this issue. The pull request I made only addresses one case, I'll look at addressing the other later this week. https://bugs.python.org/issue30988
This pull addresses the issue more completely, so I closed my pull request. https://github.com/python/cpython/pull/15600
The following can be used to
import inspect
import email
import email.policy
display_name_source = inspect.getsource(email._header_value_parser)
header_parser_15600 = [
("if res[0][0].token_type == 'cfws':",
"if isinstance(res[0], TokenList) and res[0][0].token_type == 'cfws':"),
("if res[-1][-1].token_type == 'cfws':",
"if isinstance(res[-1], TokenList) and res[-1][-1].token_type == 'cfws':"),
('''
if leader is not None:
token[0][:0] = [leader]
leader = None
name_addr.append(token)
''', '''
if leader is not None:
if isinstance(token[0], TokenList):
token[0][:0] = [leader]
else:
token[:0] = [leader]
leader = None
name_addr.append(token)
''')
]
display_name_source_new = display_name_source
for prev, fix in header_parser_15600:
display_name_source_new = display_name_source_new.replace(prev, fix)
exec(display_name_source_new , email._header_value_parser.__dict__)
email.message_from_string("""From: John Doe.<[email protected]>
To: . Doe <[email protected]>
Test e-mail body.
""", policy=email.policy.default).items()
An upstream fix should be deployed, I'll try to find time to check this week.
https://github.com/python/cpython/pull/15600
It is fixed in Python 3.13.0b1, so it should make it into Python 3.13 this fall.
tests\test_emlparser.py:250 (TestEMLParser.test_headeremail2list_2)
self = <tests.test_emlparser.TestEMLParser object at 0x00000000050F50F0>
def test_headeremail2list_2(self) -> None:
"""Here we test the headeremail2list function using an input which should trigger
a email library bug 27257
"""
with pathlib.Path(samples_dir, 'sample_bug27257.eml').open('rb') as fhdl:
raw_email = fhdl.read()
msg = email.message_from_bytes(raw_email, policy=email.policy.default)
# just to be sure we still hit bug 27257 (else there is no more need for the workaround)
> with pytest.raises(AttributeError):
E Failed: DID NOT RAISE <class 'AttributeError'>
test_emlparser.py:261: Failed