aiohttp
aiohttp copied to clipboard
On redirects, middle URL with ø char gets parsed wrongly - leading to a 404
Describe the bug
Hello,
If I try to fetch this URL using aiohttp https://cornelius-k.dk/synsproeve/, it will redirect, eventually leading to a 404 when trying to get https://cornelius-k.dk/synspr\udcf8ve at the end of the chain.
Looks like the Location header will be parsed wrongly from b'https://cornelius-k.dk/synspr\xf8ve' which I found in the Response._raw_headers.
To Reproduce
Code block:
import aiohttp
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
}
async def fetch_url(url):
async with aiohttp.ClientSession(headers=headers) as session:
async with session.get(url) as response:
for i in response.history:
print(i.url)
print(i._headers)
print(i._raw_headers)
return response.status
print(await fetch_url("https://cornelius-k.dk/synsproeve/"))
Final URL in the redirect chain will be https://cornelius-k.dk/synspr�ve instead of https://cornelius-k.dk/synsprøve and 404 will be yielded.
Expected behavior
Parsing URL in the redirects correctly and fetching the correct final URL.
Logs/tracebacks
Output of the code block:
https://cornelius-k.dk/synsproeve/
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:17 GMT', 'Content-Type': 'text/html', 'Content-Length': '162', 'd-cache': 'from-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'x-content-type-options': 'nosniff', 'strict-transport-security': 'max-age=31536000; preload', 'x-frame-options': 'SAMEORIGIN', 'content-security-policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synsproeve', 'd-geo': 'US')>
((b'server', b'nginx'), (b'date', b'Tue, 26 Nov 2024 16:02:17 GMT'), (b'content-type', b'text/html'), (b'content-length', b'162'), (b'd-cache', b'from-cache'), (b'cache-control', b'no-cache, no-store, must-revalidate'), (b'expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'x-content-type-options', b'nosniff'), (b'strict-transport-security', b'max-age=31536000; preload'), (b'x-frame-options', b'SAMEORIGIN'), (b'content-security-policy', b"frame-ancestors 'self'"), (b'location', b'https://cornelius-k.dk/synsproeve'), (b'd-geo', b'US'))
https://cornelius-k.dk/synsproeve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:18 GMT', 'Content-Type': 'text/html', 'Content-Length': '162', 'Location': 'http://cornelius-k.dk/synspr%C3%B8ve', 'd-cache': 'from-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'x-content-type-options': 'nosniff', 'strict-transport-security': 'max-age=31536000; preload', 'x-frame-options': 'SAMEORIGIN', 'content-security-policy': "frame-ancestors 'self'", 'd-geo': 'US')>
((b'server', b'nginx'), (b'date', b'Tue, 26 Nov 2024 16:02:18 GMT'), (b'content-type', b'text/html'), (b'content-length', b'162'), (b'location', b'http://cornelius-k.dk/synspr%C3%B8ve'), (b'd-cache', b'from-cache'), (b'cache-control', b'no-cache, no-store, must-revalidate'), (b'expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'x-content-type-options', b'nosniff'), (b'strict-transport-security', b'max-age=31536000; preload'), (b'x-frame-options', b'SAMEORIGIN'), (b'content-security-policy', b"frame-ancestors 'self'"), (b'd-geo', b'US'))
http://cornelius-k.dk/synspr%C3%B8ve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:18 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Security-Policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synspr\udcf8ve', 'D-Geo': 'US')>
((b'Server', b'nginx'), (b'Date', b'Tue, 26 Nov 2024 16:02:18 GMT'), (b'Content-Length', b'0'), (b'Connection', b'keep-alive'), (b'Cache-Control', b'no-cache, no-store, must-revalidate'), (b'Expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'X-Content-Type-Options', b'nosniff'), (b'X-Frame-Options', b'SAMEORIGIN'), (b'Content-Security-Policy', b"frame-ancestors 'self'"), (b'Location', b'https://cornelius-k.dk/synspr\xf8ve'), (b'D-Geo', b'US'))
(404, URL('https://cornelius-k.dk/synspr�ve'))
Python Version
3.9.20
aiohttp Version
3.11.7
multidict Version
6.1.0
propcache Version
0.2.0
yarl Version
1.17.1
OS
macOS
Related component
Client
Additional context
No response
Code of Conduct
- [X] I agree to follow the aio-libs Code of Conduct
Which setting are you using for requoting of redirects? ClientSession(requote_redirect_url=True) or ClientSession(requote_redirect_url=False) ?
@bdraco I've just tried both True/False, looks like it doesn't make any difference - same outcome
True:
http://cornelius-k.dk/synspr%C3%B8ve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Thu, 28 Nov 2024 11:22:18 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Security-Policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synspr\udcf8ve', 'D-Geo': 'US')>
((b'Server', b'nginx'), (b'Date', b'Thu, 28 Nov 2024 11:22:18 GMT'), (b'Content-Length', b'0'), (b'Connection', b'keep-alive'), (b'Cache-Control', b'no-cache, no-store, must-revalidate'), (b'Expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'X-Content-Type-Options', b'nosniff'), (b'X-Frame-Options', b'SAMEORIGIN'), (b'Content-Security-Policy', b"frame-ancestors 'self'"), (b'Location', b'https://cornelius-k.dk/synspr\xf8ve'), (b'D-Geo', b'US'))
404
False:
http://cornelius-k.dk/synspr%C3%B8ve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Thu, 28 Nov 2024 11:21:31 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Security-Policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synspr\udcf8ve', 'D-Geo': 'US')>
((b'Server', b'nginx'), (b'Date', b'Thu, 28 Nov 2024 11:21:31 GMT'), (b'Content-Length', b'0'), (b'Connection', b'keep-alive'), (b'Cache-Control', b'no-cache, no-store, must-revalidate'), (b'Expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'X-Content-Type-Options', b'nosniff'), (b'X-Frame-Options', b'SAMEORIGIN'), (b'Content-Security-Policy', b"frame-ancestors 'self'"), (b'Location', b'https://cornelius-k.dk/synspr\xf8ve'), (b'D-Geo', b'US'))
404