aiohttp
aiohttp copied to clipboard
invalid character in header
🐞 Describe the bug
[2020-11-22 21:24:53 +0800] [55277] [ERROR] Error handling request
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/aiohttp/web_protocol.py", line 314, in data_received
messages, upgraded, tail = self._request_parser.feed_data(data)
File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
💡 To Reproduce
I do not have any idea about it, the CPU pressure was not hight.
aiohttp == 3.7.3
gunicorn == 20.0.4
uvloop == 0.14.0
📋 Your version of the Python
Python == 3.6.9
📋 Your version of the aiohttp/yarl/multidict distributions
$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: [email protected]
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires: multidict, typing-extensions, idna-ssl, chardet, attrs, yarl, async-timeout
Required-by: aiohttp-devtools
$ python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.5.1
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires: multidict, typing-extensions, idna
Required-by: aiohttp
At glance, this looks like it may be a duplicate of https://github.com/aio-libs/aiohttp/issues/5220
@zhaoxuan this needs a reproducer. Without it, we'll have no idea how to trigger this behavior and won't even be able to confirm whether it's a bug.
Agree with @webknjaz If a reproducer exists we can check it and figure out what's going on. Otherwise, we should close the issue after some period and pray that #5220 fixes this problem too.
@webknjaz it was weird, I added some log in aiohttp/web_protocol.py:314
but I got nothing, so how can I get the error message?
@zhaoxuan we need to have a simple aiohttp app that reproduces the issue posted here. Even if it's a "hello world". The simpler the better, with no other deps than aiohttp. Also, we need to know what HTTP query triggers this behavior.
P.S. You cannot log what feed_data()
returns because it raises an exception meaning that the control flow doesn't get to return
at all. And you cannot debug the pyx module as easily because it would need rebuilding.
@webknjaz may be we could add a debug level logging to feed_data()
method to make debugging of such issues easier in the future?
I'm curious to see the concrete proposal for debug logging.
@webknjaz My app contains IO operation (MySQL Redis Memcache), so a simple aiohttp app could not reproduce this issue.
from aiohttp import web
from aiocache import Cache
from aiocache.serializers import PickleSerializer
async def test_handle(request: web.Request):
text = 'test'
mc = request.app['config']['memcache']
value = await mc.get(text)
if bool(value):
print("in memcache: {}".format(value))
else:
await mc.set(text, '123', 60)
return web.Response(text=text)
app = web.Application()
app['config'] = {}
mc = Cache(Cache.MEMCACHED, serializer=PickleSerializer(), endpoint='127.0.0.1', port=11211, pool_size=64)
app['config']['memcache'] = mc
app.router.add_get('/test', test_handle)
if __name__ == '__main__':
web.run_app(app)
I am not sure it would work.
There are still unresolved issues for this error.
[2020-12-08 14:47:30 +0800] [36619] [ERROR] Error handling request
Traceback (most recent call last):
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
messages, upgraded, tail = self._request_parser.feed_data(data)
File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
[2020-12-08 14:47:30 +0800] [36602] [ERROR] Error handling request
Traceback (most recent call last):
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
messages, upgraded, tail = self._request_parser.feed_data(data)
File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
[2020-12-08 14:47:30 +0800] [36608] [ERROR] Error handling request
Traceback (most recent call last):
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
messages, upgraded, tail = self._request_parser.feed_data(data)
File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
@zhaoxuan we need a reproducer without aiocache.
I do not know if it is related, but i got the same error, with the public server. I want to download all images in process of MyHeritage family tree migration. after i extract all URL i want to use aiohttp client to download them. Here is simplified example with only one (real) URL:
import asyncio
import aiohttp
async def adown(session, url):
async with session.get(url) as resp:
if resp.status == 200:
await response.read()
return url, destf
async def main(urls):
tasks = []
async with aiohttp.ClientSession() as http:
for url in urls:
tasks.append(asyncio.create_task(adown(http, url)))
for result in await asyncio.gather(*tasks, return_exceptions=True):
print(result, type(result))
URL = "https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg"
asyncio.run(main([URL]))
With result (original in one line):
python3 "aioheritage.py"
400, message='invalid character in header',
url=URL('https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg')
<class 'aiohttp.client_exceptions.ClientResponseError'>
I can download all these URL with wget, it reports these headers (for this one):
wget -qSO /dev/null https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg
HTTP/1.1 200 OK
Etag: "8389ecdb6-20beb-52ab876d5a8f3"
Last-Modified: Mon, 01 Feb 2016 17:10:39 GMT
Content-Type: image/jpeg
Content-Length: 127209
Cache-Control: max-age=315261854, public
Expires: Tue, 18 Mar 2031 08:44:33 GMT
Date: Sun, 21 Mar 2021 12:00:19 GMT
Set-Cookie: visid_incap_319037=T+vq1wp2TdO9gAVTYLyL+FM1V2AAAAAAQUIPAAAAAABtOQePeUjSDNNJKchrmEJ/; expires=Sun, 20 Mar 2022 21:39:11 GMT; HttpOnly; path=/; Domain=.myheritageimages.com
Set-Cookie: incap_ses_727_319037=o0vJdI0mfF9lb7jP8tIWClM1V2AAAAAAdJOjF9EvO1s6QdSnjUNI4Q==; path=/; Domain=.myheritageimages.com
Set-Cookie: ___utmvmzyufoSw=GntjHtylUBq; path=/; Max-Age=900
Set-Cookie: ___utmvazyufoSw=pKt\001hMfL; path=/; Max-Age=900
Set-Cookie: ___utmvbzyufoSw=tZg XdgOSalF: KtB; path=/; Max-Age=900
X-CDN: Imperva
X-Iinfo: 11-98177944-0 0CNN RT(1616328018934 107) q(0 -1 -1 -1) r(0 -1)
Debian testing, Python 3.9.2, aiohttp 3.7.4
Any news on this? Having the same issues.
Same here
I had such when header \x00Date: ... httpx httpx.RemoteProtocolError: illegal header line: bytearray(b'\x00Date: Sun, 30 May 2021 18:54:19 GMT')
Can confirm that the solution stated here works: https://github.com/aio-libs/aiohttp/issues/5355#issuecomment-750175864
Basically:
set AIOHTTP_NO_EXTENSIONS=1
Seems that the underlying C http parsing library might be broken.
Can confirm that the solution stated here works: #5355 (comment)
Basically:
set AIOHTTP_NO_EXTENSIONS=1
Seems that the underlying C http parsing library might be broken.
I've seen this solution too and tried looking into it. However, I haven't been able to find what other extensions are disabled, and what functionality is lost by doing so. Personally I was using aiohttp for production, and this is not an option I can properly consider without knowing exactly what the implications are. Anyone know?
As far as I can tell, no functionality is lost. Nonetheless, performance is degraded as mentioned here: https://github.com/aio-libs/aiohttp/blob/09ac1cbb4c07b4b3e1872374808512c9bc52f2ab/CHANGES/3828.feature
Using super dumb timeit tests I've seen a drop of about 30% in performance.
import asyncio
import os
os.environ.setdefault('AIOHTTP_NO_EXTENSIONS', '0')
from aiohttp import ClientSession
url = 'https://www.java.com/'
async def dorequest(url):
async with ClientSession() as session:
async with session.get(url) as response:
return await response.text()
def main():
result = asyncio.run(dorequest(url))
if __name__ == '__main__':
import timeit
print(timeit.timeit("main()", setup="from __main__ import main", number=150))
Can confirm that the solution stated here works: #5355 (comment)
Basically:
set AIOHTTP_NO_EXTENSIONS=1
Seems that the underlying C http parsing library might be broken.
This works for me if I set is as a environment variable. However, It does not matter whether I set AIOHTTP_NO_EXTENSIONS=1
or AIOHTTP_NO_EXTENSIONS=0
even if AIOHTTP_NO_EXTENSIONS=whatever text
, it still works only if I have it as an environment variable
Using super dumb timeit tests I've seen a drop of about 30% in performance.
HTTP parser is the bottleneck in web apps and so it's implemented as an optional C-extension most of which is coming from http_parser
— a lib that used to be the default in NodeJS. It is now deprecated and should be replaced by its successor lhttp
which is not yet implemented (https://github.com/aio-libs/aiohttp/pull/5364).
This is the reason for the performance drop.
Has anyone found a workaround for this? I am making a request to an API I myself can not fix. But this works fine with requests lib. How could the "invalid character in header" be fixed?
Set AIOHTTP_NO_EXTENSIONS=1
I read somewhere, that it can leads to 30% performance drop, and this is significant drop, thus not solution at all, only workaround for simple cases.
Man, you have a broken HTTP server. Slow processing is better than nothing in such case, isn't it?
Encountered this at work. There are a few things which make this particularly frustrating.
First, the 400
in the exception has nothing to do with the response, and is confusing, thinking that the server sent a 400 HTTP response, when it was actually a 200 response.
Second, I had to hack aiohttp to even be able to print out the raw response which was causing the issue.
Third, even after getting the raw response, it was very difficult to find what was actually wrong in the response. After quite some time, I found a Set-Cookie header with a \x01
byte in the middle of it.
Figuring out this information should be much quicker, so we can report the problems to the server owner.
For servers where this is not possible, there should probably also be a way to discard any headers with invalid characters and continue parsing the rest of the response.
I think changing the 400 output of the error message should be trivially fixed by us, but everything else may need some changes in llhttp.
Are you setting the lenient headers flag?
I'm rather short on time, so if anybody wants to pick this up and try setting the lenient option, that would be great. https://github.com/nodejs/llhttp/issues/174#issuecomment-1209929363
I simply want to share that adding AIOHTTP_NO_EXTENSIONS=1
as an environment variable before running the code also fixed this issue for me. Thanks.
I simply want to share that adding
AIOHTTP_NO_EXTENSIONS=1
as an environment variable before running the code also fixed this issue for me. Thanks.
Because it then doesn't use llhttp, which means it is a lot slower. So, not recommended for production.
I've added some information to the exceptions to highlight what character is causing a parse error (in 3.8.5). However, we won't be enabling lenient headers currently as this will reintroduce the GHSA-45c4-8wx5-qw6w vulnerability. I'll follow this up with llhttp, and if the situation changes we'll make any recommended changes. Currently, it appears there is nothing we can do.
After reviewing this again, it should be safe to enable lenient headers in the response parser (i.e. client side only). This should be enabled in the next release (#7490). The parser will revert to being strict when using dev mode (python -X dev
), which should help developers detect mistakes in their servers.