aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

invalid character in header

Open zhaoxuan opened this issue 4 years ago • 28 comments

🐞 Describe the bug

[2020-11-22 21:24:53 +0800] [55277] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'

💡 To Reproduce

I do not have any idea about it, the CPU pressure was not hight.

aiohttp == 3.7.3
gunicorn == 20.0.4
uvloop == 0.14.0

📋 Your version of the Python

Python == 3.6.9

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: [email protected]
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires: multidict, typing-extensions, idna-ssl, chardet, attrs, yarl, async-timeout
Required-by: aiohttp-devtools
$ python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.5.1
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires: multidict, typing-extensions, idna
Required-by: aiohttp

zhaoxuan avatar Nov 22 '20 13:11 zhaoxuan

At glance, this looks like it may be a duplicate of https://github.com/aio-libs/aiohttp/issues/5220

webknjaz avatar Nov 22 '20 18:11 webknjaz

@zhaoxuan this needs a reproducer. Without it, we'll have no idea how to trigger this behavior and won't even be able to confirm whether it's a bug.

webknjaz avatar Nov 22 '20 18:11 webknjaz

Agree with @webknjaz If a reproducer exists we can check it and figure out what's going on. Otherwise, we should close the issue after some period and pray that #5220 fixes this problem too.

asvetlov avatar Nov 22 '20 20:11 asvetlov

@webknjaz it was weird, I added some log in aiohttp/web_protocol.py:314 but I got nothing, so how can I get the error message?

zhaoxuan avatar Nov 23 '20 06:11 zhaoxuan

@zhaoxuan we need to have a simple aiohttp app that reproduces the issue posted here. Even if it's a "hello world". The simpler the better, with no other deps than aiohttp. Also, we need to know what HTTP query triggers this behavior.

P.S. You cannot log what feed_data() returns because it raises an exception meaning that the control flow doesn't get to return at all. And you cannot debug the pyx module as easily because it would need rebuilding.

webknjaz avatar Nov 23 '20 10:11 webknjaz

@webknjaz may be we could add a debug level logging to feed_data() method to make debugging of such issues easier in the future?

derlih avatar Nov 24 '20 09:11 derlih

I'm curious to see the concrete proposal for debug logging.

asvetlov avatar Nov 24 '20 09:11 asvetlov

@webknjaz My app contains IO operation (MySQL Redis Memcache), so a simple aiohttp app could not reproduce this issue.

from aiohttp import web

from aiocache import Cache
from aiocache.serializers import PickleSerializer

async def test_handle(request: web.Request):
    text = 'test'
    mc = request.app['config']['memcache']

    value = await mc.get(text)

    if bool(value):
        print("in memcache: {}".format(value))
    else:
        await mc.set(text, '123', 60)

    return web.Response(text=text)

app = web.Application()
app['config'] = {}

mc = Cache(Cache.MEMCACHED, serializer=PickleSerializer(), endpoint='127.0.0.1', port=11211, pool_size=64)
app['config']['memcache'] = mc

app.router.add_get('/test', test_handle)

if __name__ == '__main__':
    web.run_app(app)

I am not sure it would work.

zhaoxuan avatar Nov 24 '20 09:11 zhaoxuan

There are still unresolved issues for this error.

[2020-12-08 14:47:30 +0800] [36619] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
[2020-12-08 14:47:30 +0800] [36602] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
[2020-12-08 14:47:30 +0800] [36608] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'

zhaoxuan avatar Dec 08 '20 06:12 zhaoxuan

@zhaoxuan we need a reproducer without aiocache.

webknjaz avatar Mar 19 '21 23:03 webknjaz

I do not know if it is related, but i got the same error, with the public server. I want to download all images in process of MyHeritage family tree migration. after i extract all URL i want to use aiohttp client to download them. Here is simplified example with only one (real) URL:

import asyncio

import aiohttp

async def adown(session, url):
    async with session.get(url) as resp:
        if resp.status == 200:
            await response.read()

    return url, destf

async def main(urls):
    tasks = []
    async with aiohttp.ClientSession() as http:
        for url in urls:
            tasks.append(asyncio.create_task(adown(http, url)))

        for result in await asyncio.gather(*tasks, return_exceptions=True):
            print(result, type(result))

URL = "https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg"

asyncio.run(main([URL]))

With result (original in one line):

python3 "aioheritage.py"
400, message='invalid character in header', 
    url=URL('https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg')
    <class 'aiohttp.client_exceptions.ClientResponseError'>

I can download all these URL with wget, it reports these headers (for this one):

wget -qSO /dev/null https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg
  HTTP/1.1 200 OK
  Etag: "8389ecdb6-20beb-52ab876d5a8f3"
  Last-Modified: Mon, 01 Feb 2016 17:10:39 GMT
  Content-Type: image/jpeg
  Content-Length: 127209
  Cache-Control: max-age=315261854, public
  Expires: Tue, 18 Mar 2031 08:44:33 GMT
  Date: Sun, 21 Mar 2021 12:00:19 GMT
  Set-Cookie: visid_incap_319037=T+vq1wp2TdO9gAVTYLyL+FM1V2AAAAAAQUIPAAAAAABtOQePeUjSDNNJKchrmEJ/; expires=Sun, 20 Mar 2022 21:39:11 GMT; HttpOnly; path=/; Domain=.myheritageimages.com
  Set-Cookie: incap_ses_727_319037=o0vJdI0mfF9lb7jP8tIWClM1V2AAAAAAdJOjF9EvO1s6QdSnjUNI4Q==; path=/; Domain=.myheritageimages.com
  Set-Cookie: ___utmvmzyufoSw=GntjHtylUBq; path=/; Max-Age=900
  Set-Cookie: ___utmvazyufoSw=pKt\001hMfL; path=/; Max-Age=900
  Set-Cookie: ___utmvbzyufoSw=tZg      XdgOSalF: KtB; path=/; Max-Age=900
  X-CDN: Imperva
  X-Iinfo: 11-98177944-0 0CNN RT(1616328018934 107) q(0 -1 -1 -1) r(0 -1)

Debian testing, Python 3.9.2, aiohttp 3.7.4

slavkoja avatar Mar 21 '21 12:03 slavkoja

Any news on this? Having the same issues.

yotamgod avatar May 23 '21 11:05 yotamgod

Same here

ppmdo avatar May 28 '21 10:05 ppmdo

I had such when header \x00Date: ... httpx httpx.RemoteProtocolError: illegal header line: bytearray(b'\x00Date: Sun, 30 May 2021 18:54:19 GMT')

kolamor avatar May 30 '21 18:05 kolamor

Can confirm that the solution stated here works: https://github.com/aio-libs/aiohttp/issues/5355#issuecomment-750175864

Basically: set AIOHTTP_NO_EXTENSIONS=1

Seems that the underlying C http parsing library might be broken.

ppmdo avatar May 31 '21 09:05 ppmdo

Can confirm that the solution stated here works: #5355 (comment)

Basically: set AIOHTTP_NO_EXTENSIONS=1

Seems that the underlying C http parsing library might be broken.

I've seen this solution too and tried looking into it. However, I haven't been able to find what other extensions are disabled, and what functionality is lost by doing so. Personally I was using aiohttp for production, and this is not an option I can properly consider without knowing exactly what the implications are. Anyone know?

yotamgod avatar May 31 '21 09:05 yotamgod

As far as I can tell, no functionality is lost. Nonetheless, performance is degraded as mentioned here: https://github.com/aio-libs/aiohttp/blob/09ac1cbb4c07b4b3e1872374808512c9bc52f2ab/CHANGES/3828.feature

Using super dumb timeit tests I've seen a drop of about 30% in performance.

import asyncio
import os

os.environ.setdefault('AIOHTTP_NO_EXTENSIONS', '0')

from aiohttp import ClientSession


url = 'https://www.java.com/'

async def dorequest(url):
    async with ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

def main():
    result = asyncio.run(dorequest(url))

if __name__ == '__main__':
    import timeit
    print(timeit.timeit("main()", setup="from __main__ import main", number=150))

ppmdo avatar May 31 '21 09:05 ppmdo

Can confirm that the solution stated here works: #5355 (comment)

Basically: set AIOHTTP_NO_EXTENSIONS=1

Seems that the underlying C http parsing library might be broken.

This works for me if I set is as a environment variable. However, It does not matter whether I set AIOHTTP_NO_EXTENSIONS=1 or AIOHTTP_NO_EXTENSIONS=0 even if AIOHTTP_NO_EXTENSIONS=whatever text, it still works only if I have it as an environment variable

g66-gopanear avatar Jun 14 '21 06:06 g66-gopanear

Using super dumb timeit tests I've seen a drop of about 30% in performance.

HTTP parser is the bottleneck in web apps and so it's implemented as an optional C-extension most of which is coming from http_parser — a lib that used to be the default in NodeJS. It is now deprecated and should be replaced by its successor lhttp which is not yet implemented (https://github.com/aio-libs/aiohttp/pull/5364).

This is the reason for the performance drop.

webknjaz avatar Jun 19 '21 08:06 webknjaz

Has anyone found a workaround for this? I am making a request to an API I myself can not fix. But this works fine with requests lib. How could the "invalid character in header" be fixed?

Wolfhound905 avatar Sep 28 '21 14:09 Wolfhound905

Set AIOHTTP_NO_EXTENSIONS=1

asvetlov avatar Oct 30 '21 10:10 asvetlov

I read somewhere, that it can leads to 30% performance drop, and this is significant drop, thus not solution at all, only workaround for simple cases.

slavkoja avatar Oct 30 '21 11:10 slavkoja

Man, you have a broken HTTP server. Slow processing is better than nothing in such case, isn't it?

asvetlov avatar Oct 30 '21 13:10 asvetlov

Encountered this at work. There are a few things which make this particularly frustrating.

First, the 400 in the exception has nothing to do with the response, and is confusing, thinking that the server sent a 400 HTTP response, when it was actually a 200 response.

Second, I had to hack aiohttp to even be able to print out the raw response which was causing the issue.

Third, even after getting the raw response, it was very difficult to find what was actually wrong in the response. After quite some time, I found a Set-Cookie header with a \x01 byte in the middle of it.

Figuring out this information should be much quicker, so we can report the problems to the server owner.

For servers where this is not possible, there should probably also be a way to discard any headers with invalid characters and continue parsing the rest of the response.

I think changing the 400 output of the error message should be trivially fixed by us, but everything else may need some changes in llhttp.

Dreamsorcerer avatar Aug 08 '22 19:08 Dreamsorcerer

Are you setting the lenient headers flag?

pallas avatar Aug 08 '22 22:08 pallas

I'm rather short on time, so if anybody wants to pick this up and try setting the lenient option, that would be great. https://github.com/nodejs/llhttp/issues/174#issuecomment-1209929363

Dreamsorcerer avatar Aug 15 '22 19:08 Dreamsorcerer

I simply want to share that adding AIOHTTP_NO_EXTENSIONS=1 as an environment variable before running the code also fixed this issue for me. Thanks.

EnriqueOrtiz27 avatar Sep 05 '22 16:09 EnriqueOrtiz27

I simply want to share that adding AIOHTTP_NO_EXTENSIONS=1 as an environment variable before running the code also fixed this issue for me. Thanks.

Because it then doesn't use llhttp, which means it is a lot slower. So, not recommended for production.

Dreamsorcerer avatar Sep 05 '22 16:09 Dreamsorcerer

I've added some information to the exceptions to highlight what character is causing a parse error (in 3.8.5). However, we won't be enabling lenient headers currently as this will reintroduce the GHSA-45c4-8wx5-qw6w vulnerability. I'll follow this up with llhttp, and if the situation changes we'll make any recommended changes. Currently, it appears there is nothing we can do.

Dreamsorcerer avatar Jul 22 '23 14:07 Dreamsorcerer

After reviewing this again, it should be safe to enable lenient headers in the response parser (i.e. client side only). This should be enabled in the next release (#7490). The parser will revert to being strict when using dev mode (python -X dev), which should help developers detect mistakes in their servers.

Dreamsorcerer avatar Aug 06 '23 20:08 Dreamsorcerer