httpx icon indicating copy to clipboard operation
httpx copied to clipboard

Relax parsing of invalid HTTP header names?

Open shimachao opened this issue 4 years ago • 10 comments

Checklist

  • [ ] The bug is reproducible against the latest release and/or master.
  • [ ] There are no similar issues or pull requests to fix it yet.

Describe the bug

When server returns bad headers,RemoteProtocolError occurred

To reproduce

headers = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/apng,/;q=0.8,application/" "signed-exchange;v=b3;q=0.9", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "zh-CN,zh;q=0.9", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/" "86.0.4240.75 Safari/537.36"} client= httpx.AsyncClient(headers=headers, timeout=120,verify=False) await client.get("https://club.huawei.com/forum.html")

Expected behavior

Actual behavior

Debugging material

Environment

  • OS: Windows 10
  • Python version: 3.8.5
  • HTTPX version: 0.16.1
  • Async environment: asyncio
  • HTTP proxy: no
  • Custom certificates: no

Additional context

bad header: get-ban-to-cache-result/forum.php: userdata not support

shimachao avatar Oct 15 '20 12:10 shimachao

Sure, so here's the simplest reproduction...

>>> import httpx
>>> httpx.get("https://club.huawei.com/")

Which is occurring because the server is returning an illegal HTTP header name...

HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Security-Policy: base-uri
Content-Type: text/html; charset=utf-8
Date: Thu, 15 Oct 2020 13:19:33 GMT
Server: CloudWAF
Set-Cookie: HWWAFSESID=a74181602debc465809; path=/
Set-Cookie: HWWAFSESTIME=1602767969615; path=/
Set-Cookie: a3ps_2132_saltkey=yCXrVqdR06Nk5u2PrmLgs9eqlGIpQd9FogV2GL6bxGP3HH2XweRXIeCVny%2BrVDpoOYNLphTU9uVN1HP1%2Fav1bvV2Yrafq%2BXdJR%2BVAVPHizU92ISGAest0dKt7%2FIbdulNYXV0aGtleQ%3D%3D; path=/; secure; httponly
Set-Cookie: a3ps_2132_errorinfo=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; secure; httponly
Set-Cookie: a3ps_2132_errorcode=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; secure; httponly
Set-Cookie: a3ps_2132_auth=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; secure; httponly
Set-Cookie: a3ps_2132_lastvisit=1602764373; expires=Sat, 14-Nov-2020 13:19:33 GMT; Max-Age=2592000; path=/; secure; httponly
Set-Cookie: a3ps_2132_lastact=1602767973%09portal.php%09; expires=Fri, 16-Oct-2020 13:19:33 GMT; Max-Age=86400; path=/; secure; httponly
Set-Cookie: a3ps_2132_currentHwLoginUrl=http%3A%2F%2Fcn.club.vmall.com%2F; expires=Thu, 15-Oct-2020 15:19:33 GMT; Max-Age=7200; path=/; secure; httponly
Transfer-Encoding: chunked
X-XSS-Protection: 1; mode=block
banlist-ip: 0
banlist-uri: 0
get-ban-to-cache-result/portal.php: userdata not support
get-ban-to-cache-result62.31.28.214: userdata not support
result-ip: 0
result-uri: 0

That get-ban-to-cache-result/portal.php header isn't legal HTTP.

However it's possible that we'd like h11 to be more lax on the validation, so that we can accept invalid header names so long as they're still parsable.

tomchristie avatar Oct 15 '20 13:10 tomchristie

hei,I found an imperfect but useful solution. Execute the following code before using httpx: h11.readers.header_field_re = re.compile(b"(?P<field_name>[-!#$%&'*+.^`/|~0-9a-zA-Z]+):[ \t](?P<field_value>([^\x00\s]+(?:[ \t]+[^\x00\s]+))?)[ \t]*")

shimachao avatar Oct 16 '20 03:10 shimachao

Opened https://github.com/python-hyper/h11/issues/113 to discuss this on the h11 side.

tomchristie avatar Oct 16 '20 13:10 tomchristie

Is it possible to disable this check for a single request?

I'm working with rewriting some sync code using requests to async httpx but currently can't go ahead due to a server outside of my control sending a "?" in one of the headers.

So right now I'm weighing my options between abandoning this entirely, picking another library to use alongside httpx for the problematic server or replacing httpx entirely.

Looked at @shimachao's solution but I feel a little uneasy about using untested patching across the board, especially considering that it's only one server that misbehaves. Either way that patch verbatim doesn't work for me as it instead chokes on other "normal" headers". The pattern is also moved to h11._readers which I suspect is a hint that they further want to discourage us to take such measures.

Edit: Patched the library to add "?" to tchar for token in the abnf instead, can't do it in runtime though so I think I need to hard fork h11 for this to work.

Hultner avatar Nov 24 '21 16:11 Hultner

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 20 '22 15:02 stale[bot]

I think this probably still needs tracking. Thanks tho, @stale.

tomchristie avatar Feb 21 '22 13:02 tomchristie

I think this probably still needs tracking. Thanks tho, @Stale.

I’m still watching this :) I currently have to proxy bad servers and drop headers.

Hultner avatar Feb 21 '22 15:02 Hultner

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 25 '22 07:03 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 15 '22 19:10 stale[bot]

This is a way that web defender can use to prevent scanning from feroxbuster. Adding non-standard HTTP response headers/values lol.

pich4ya avatar Jul 17 '23 13:07 pich4ya