cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Potential Quadratic Complexity Vulnerabilities in the `email` Module

Open kexinoh opened this issue 5 months ago • 2 comments

Bug Description: A series of simple quadratic complexity vulnerabilities has been identified in the email package. After confirmation by CPython's security team, these low-threat DOS vulnerabilities can be fixed with community assistance.

Vulnerability Locations (All Fixed):

  1. https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/message.py#L73 2.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1424 3.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1506 4.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1688 5.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1697 6.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L1847 7.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2200 8.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2231 9.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2260 10.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2411 11.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2570 12.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2642 13.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2762 14.https://github.com/python/cpython/blob/5ab66a882d1b5e44ec50b25df116ab209d65863f/Lib/email/_header_value_parser.py#L2965

Repair Status:

  • @picnixz is currently fixing all listed vulnerabilities in the email package (#134947).

Common Information:

  • CPython Version: main branch
  • Operating System: Linux
  • Credits: Finder is kexinoh (Xiangfan Wu) from QI-ANXIN Technology Research Institute.

Linked PRs

  • gh-134947
  • gh-136072

kexinoh avatar Jun 28 '25 08:06 kexinoh

To be precise, I still didn't fix everything. There are subtle cases that may require an entire module rewrite.

picnixz avatar Jun 28 '25 08:06 picnixz

Ok. I revised the expression

kexinoh avatar Jun 28 '25 08:06 kexinoh

So we merged the simple fix. But to be able to fix the rest, we need to work with a different design where we don't split the input into two.

@serhiy-storchaka do you still have plans to work on this? or an API design in mind here?

picnixz avatar Oct 31 '25 14:10 picnixz

FYI it was always my intent to revisit the header value parser and rewrite it, probably to use string indexing instead of string splitting, but I never even got as far as starting. Its current form I considered to be a "first draft" "proof of concept", which is one reason it is marked as a private module. I got pulled away from core development before I could embark on a second draft :( I'd be interested in hearing Serhiy's thoughts on the best rewrite, because I hadn't even recognized the quadratic complexity issue, other than intuitively knowing that creating lots of strings was inefficient :) I'm willing to work on this, and should now be able to make time to do so.

bitdancer avatar Nov 06 '25 15:11 bitdancer

There are two approaches:

  1. Each parsing function should take a string and the starting index and return an index (in addition to the result of parsing if it is needed). Similar approach is used in HTMLParser (but the string is an attribute of the parser).
  2. Make the string and the current index mutable attributes of the tokenizer/parser object. The parsing function should take such object and update the index in-place. This is used in the re parser.

serhiy-storchaka avatar Nov 06 '25 20:11 serhiy-storchaka

Thanks. I'll have to noodle around with the code a bit to figure out what approach I want to take.

bitdancer avatar Nov 06 '25 21:11 bitdancer