Slow IDNA decoding with large strings
Bug report
Originally reported to the security address on September 9.
('xn--016c'+'a'*5000).encode('utf-8').decode('idna')
The execution time is not linear in relation to the input string size, which can cause slowness with large inputs:
10 chars = 0.016 seconds 100 chars = 0.047 seconds 1000 chars = 2.883 seconds 2500 chars = 17.724 seconds 5000 chars = 1 min 10 seconds
Comment by @tiran:
According to spec https://unicode.org/reports/tr46/ an IDNA label must not be longer than 63 characters. Python's idna module enforces the restriction, but too late.
This may be abused in some cases, for example by passing a crafted host name to asyncio create_connection:
import asyncio
async def main():
loop = asyncio.get_running_loop()
await loop.create_connection(
lambda: [], ('xn--016c'+'a'*5000).encode('utf-8'), 443
)
asyncio.run(main())
Your environment
- CPython versions tested on: CPython repository 'main' branch checkout, version 3.8.12, version 2.7.18
- Operating system and architecture: Ubuntu Linux x64
- PR: gh-99092
- PR: gh-99222
- PR: gh-99229
- PR: gh-99230
- PR: gh-99231
- PR: gh-99232
This is probably in ToUnicode and ToASCII of https://github.com/python/cpython/blob/main/Lib/encodings/idna.py and/or in https://github.com/python/cpython/blob/main/Lib/encodings/punycode.py itself, where we could presumably just do an up front length check and reject inputs that are obviously too long to possibly decode into a label length that DNS standards will accept.
If there are libraries that allow an attacker controlled hostname without a reasonable length check on it to get into a connect or similar call that tries idna decoding, that'd make this remotely exploitable. Based solely on code inspection, the urllib.request.HTTPRedirectHandler class is probably vulnerable to this - https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L652 - the location or uri headers it consumes on a HTTP 302 redirect reponse to construct the new URL are not obviously limited, nor is the host that ultimately winds it way down into the socket module. (I didn't test this, I was just reading code) A test case would be to point urllib at a malicious server that sends a 2000 byte idna hostname in a 302 redirect header...
The issue #99083 was marked as a duplicate of this issue.
PRs are either merged or will be merged before the next release (marked as release-blockers) so I'm closing this.
A CVE id has been assigned CVE-2022-45061 for tracking purposes.
I created https://python-security.readthedocs.io/vuln/slow-idna-large-strings.html to track this vulnerability. The fix is not merged into 3.8 and 3.9 branches yet.