Potential Quadratic Complexity Vulnerabilities in the `email` Module
Bug Description:
A series of simple quadratic complexity vulnerabilities has been identified in the email package. After confirmation by CPython's security team, these low-threat DOS vulnerabilities can be fixed with community assistance.
Vulnerability Locations (All Fixed):
- https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/message.py#L73 2.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1424 3.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1506 4.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1688 5.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1697 6.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1847 7.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2200 8.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2231 9.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2260 10.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2411 11.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2570 12.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2642 13.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2762 14.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2965
Repair Status:
- @picnixz is currently fixing all listed vulnerabilities in the email package (#134947).
Common Information:
- CPython Version: main branch
- Operating System: Linux
- Credits: Finder is kexinoh (Xiangfan Wu) from QI-ANXIN Technology Research Institute.
Linked PRs
- gh-134947
- gh-136072
To be precise, I still didn't fix everything. There are subtle cases that may require an entire module rewrite.
Ok. I revised the expression
So we merged the simple fix. But to be able to fix the rest, we need to work with a different design where we don't split the input into two.
@serhiy-storchaka do you still have plans to work on this? or an API design in mind here?
FYI it was always my intent to revisit the header value parser and rewrite it, probably to use string indexing instead of string splitting, but I never even got as far as starting. Its current form I considered to be a "first draft" "proof of concept", which is one reason it is marked as a private module. I got pulled away from core development before I could embark on a second draft :( I'd be interested in hearing Serhiy's thoughts on the best rewrite, because I hadn't even recognized the quadratic complexity issue, other than intuitively knowing that creating lots of strings was inefficient :) I'm willing to work on this, and should now be able to make time to do so.
There are two approaches:
- Each parsing function should take a string and the starting index and return an index (in addition to the result of parsing if it is needed). Similar approach is used in HTMLParser (but the string is an attribute of the parser).
- Make the string and the current index mutable attributes of the tokenizer/parser object. The parsing function should take such object and update the index in-place. This is used in the
reparser.
Thanks. I'll have to noodle around with the code a bit to figure out what approach I want to take.