httpx
httpx copied to clipboard
Relax parsing of invalid HTTP header names?
Checklist
- [ ] The bug is reproducible against the latest release and/or
master
. - [ ] There are no similar issues or pull requests to fix it yet.
Describe the bug
When server returns bad headers,RemoteProtocolError occurred
To reproduce
headers = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/apng,/;q=0.8,application/" "signed-exchange;v=b3;q=0.9", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "zh-CN,zh;q=0.9", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/" "86.0.4240.75 Safari/537.36"} client= httpx.AsyncClient(headers=headers, timeout=120,verify=False) await client.get("https://club.huawei.com/forum.html")
Expected behavior
Actual behavior
Debugging material
Environment
- OS: Windows 10
- Python version: 3.8.5
- HTTPX version: 0.16.1
- Async environment: asyncio
- HTTP proxy: no
- Custom certificates: no
Additional context
bad header: get-ban-to-cache-result/forum.php: userdata not support
Sure, so here's the simplest reproduction...
>>> import httpx
>>> httpx.get("https://club.huawei.com/")
Which is occurring because the server is returning an illegal HTTP header name...
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Security-Policy: base-uri
Content-Type: text/html; charset=utf-8
Date: Thu, 15 Oct 2020 13:19:33 GMT
Server: CloudWAF
Set-Cookie: HWWAFSESID=a74181602debc465809; path=/
Set-Cookie: HWWAFSESTIME=1602767969615; path=/
Set-Cookie: a3ps_2132_saltkey=yCXrVqdR06Nk5u2PrmLgs9eqlGIpQd9FogV2GL6bxGP3HH2XweRXIeCVny%2BrVDpoOYNLphTU9uVN1HP1%2Fav1bvV2Yrafq%2BXdJR%2BVAVPHizU92ISGAest0dKt7%2FIbdulNYXV0aGtleQ%3D%3D; path=/; secure; httponly
Set-Cookie: a3ps_2132_errorinfo=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; secure; httponly
Set-Cookie: a3ps_2132_errorcode=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; secure; httponly
Set-Cookie: a3ps_2132_auth=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; secure; httponly
Set-Cookie: a3ps_2132_lastvisit=1602764373; expires=Sat, 14-Nov-2020 13:19:33 GMT; Max-Age=2592000; path=/; secure; httponly
Set-Cookie: a3ps_2132_lastact=1602767973%09portal.php%09; expires=Fri, 16-Oct-2020 13:19:33 GMT; Max-Age=86400; path=/; secure; httponly
Set-Cookie: a3ps_2132_currentHwLoginUrl=http%3A%2F%2Fcn.club.vmall.com%2F; expires=Thu, 15-Oct-2020 15:19:33 GMT; Max-Age=7200; path=/; secure; httponly
Transfer-Encoding: chunked
X-XSS-Protection: 1; mode=block
banlist-ip: 0
banlist-uri: 0
get-ban-to-cache-result/portal.php: userdata not support
get-ban-to-cache-result62.31.28.214: userdata not support
result-ip: 0
result-uri: 0
That get-ban-to-cache-result/portal.php
header isn't legal HTTP.
However it's possible that we'd like h11
to be more lax on the validation, so that we can accept invalid header names so long as they're still parsable.
hei,I found an imperfect but useful solution. Execute the following code before using httpx: h11.readers.header_field_re = re.compile(b"(?P<field_name>[-!#$%&'*+.^`/|~0-9a-zA-Z]+):[ \t](?P<field_value>([^\x00\s]+(?:[ \t]+[^\x00\s]+))?)[ \t]*")
Opened https://github.com/python-hyper/h11/issues/113 to discuss this on the h11
side.
Is it possible to disable this check for a single request?
I'm working with rewriting some sync code using requests to async httpx but currently can't go ahead due to a server outside of my control sending a "?" in one of the headers.
So right now I'm weighing my options between abandoning this entirely, picking another library to use alongside httpx for the problematic server or replacing httpx entirely.
Looked at @shimachao's solution but I feel a little uneasy about using untested patching across the board, especially considering that it's only one server that misbehaves. Either way that patch verbatim doesn't work for me as it instead chokes on other "normal" headers". The pattern is also moved to h11._readers
which I suspect is a hint that they further want to discourage us to take such measures.
Edit: Patched the library to add "?" to tchar for token in the abnf instead, can't do it in runtime though so I think I need to hard fork h11 for this to work.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think this probably still needs tracking. Thanks tho, @stale.
I think this probably still needs tracking. Thanks tho, @Stale.
I’m still watching this :) I currently have to proxy bad servers and drop headers.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This is a way that web defender can use to prevent scanning from feroxbuster. Adding non-standard HTTP response headers/values lol.