cheroot Python int too large to convert to C ssize

I'm submitting a ...

[X] bug report
[ ] feature request
[ ] question about the decisions made in the repository

Describe the bug. What is the current behavior? I found this in my error logs

ENGINE OverflowError('Python int too large to convert to C ssize_t',)
Traceback (most recent call last):
  File "C:\Program Files\Python35\lib\site-packages\cheroot\server.py", line 1152, in communicate
    req.respond()
  File "C:\Program Files\Python35\lib\site-packages\cheroot\server.py", line 974, in respond
    self.server.gateway(self).respond()
  File "C:\Program Files\Python35\lib\site-packages\cheroot\wsgi.py", line 136, in respond
    self.write(chunk)
  File "C:\Program Files\Python35\lib\site-packages\cheroot\wsgi.py", line 215, in write
    self.req.send_headers()
  File "C:\Program Files\Python35\lib\site-packages\cheroot\server.py", line 1083, in send_headers
    self.rfile.read(remaining)
  File "C:\Program Files\Python35\lib\site-packages\cheroot\server.py", line 315, in read
    data = self.rfile.read(size)
  File "C:\Program Files\Python35\lib\_pyio.py", line 1000, in read
    return self._read_unlocked(size)
  File "C:\Program Files\Python35\lib\_pyio.py", line 1040, in _read_unlocked
    chunk = self.raw.read(wanted)
OverflowError: Python int too large to convert to C ssize_t

To Reproduce Simply send b"GET / HTTP/1.1\r\nContent-Length: 9999999999999999999\r\n\r\n" to the server. With server.max_request_body_size: set to 0. One 9 less (b"GET / HTTP/1.1\r\nContent-Length: 999999999999999999\r\n\r\n") yields a MemoryError, but that is to be expected I guess

Expected behavior This exception should be handled gracefully.

Environment

Cheroot version: 6.3.2.post0 (my log entry is 3 weeks old, but i don't think I updated since then...)
Python version: Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
OS: Windows Server 2012 R2 x64

Jul 19 '18 08:07 Dobatymo

@Dobatymo how big was the payload of HTTP request? Any useful info with regard to this?

Jul 19 '18 08:07 webknjaz

Also, in what way are you running Cheroot? Please include at least a small snippet.

Jul 19 '18 08:07 webknjaz

I am hosting a simple flask application with cherrypy. The actual cherrpy related code is just a few lines. Tomorrow I can check if I can corelate my other logs to find any info about that request. But I cannot imaging that any of my request sizes would overflow ssize_t (which is a 64bit signed int on a 64-bit system is it?)

Jul 19 '18 15:07 Dobatymo

To reproduce send b"GET / HTTP/1.1\r\nContent-Length: 9999999999999999999\r\n\r\n" to the server, with server.max_request_body_size: set to 0.

Jul 20 '18 03:07 Dobatymo

Well, that would be a malicious request.

Jul 20 '18 04:07 webknjaz

Also, now you can use WSGI server directly from Cheroot, without involving whole CherryPy dependency :)

Jul 20 '18 04:07 webknjaz

I'm seeing this in my logs as well.

Oct 24 '19 05:10 calebrob6

@calebrob6 any change you could send a PR with regression tests/reproducer and maybe a fix?

Oct 24 '19 13:10 webknjaz

You can reproduce by running a server (e.g. from https://docs.cherrypy.org/projects/cheroot/en/latest/pkg/cheroot.wsgi.html) with:

from cheroot import wsgi

def my_crazy_app(environ, start_response):
    status = '200 OK'
    response_headers = [('Content-type','text/plain')]
    start_response(status, response_headers)
    return [b'Hello world!']

addr = '0.0.0.0', 8070
server = wsgi.Server(addr, my_crazy_app)
server.start()

Then, for getting error mentioned in this thread :

import requests
headers = {'Content-Length': str(2**63)}
requests.get("http://localhost:8070/", headers=headers)

If I use Content-Length 2^62 (down to 2^35) I will instead get a MemoryError with the following Traceback:

Traceback (most recent call last):
  File "/home/caleb/code/cheroot/cheroot/server.py", line 1281, in communicate
    req.respond()
  File "/home/caleb/code/cheroot/cheroot/server.py", line 1084, in respond
    self.server.gateway(self).respond()
  File "/home/caleb/code/cheroot/cheroot/wsgi.py", line 148, in respond
    self.write(chunk)
  File "/home/caleb/code/cheroot/cheroot/wsgi.py", line 232, in write
    self.req.ensure_headers_sent()
  File "/home/caleb/code/cheroot/cheroot/server.py", line 1131, in ensure_headers_sent
    self.send_headers()
  File "/home/caleb/code/cheroot/cheroot/server.py", line 1197, in send_headers
    self.rfile.read(remaining)
  File "/home/caleb/code/cheroot/cheroot/server.py", line 383, in read
    data = self.rfile.read(size)
  File "/home/caleb/code/cheroot/cheroot/makefile.py", line 420, in read
    val = super().read(*args, **kwargs)
  File "/usr/lib/python3.5/_pyio.py", line 1000, in read
    return self._read_unlocked(size)
  File "/usr/lib/python3.5/_pyio.py", line 1040, in _read_unlocked
    chunk = self.raw.read(wanted)
MemoryError

Do you have suggestions of where a fix should go? Perhaps setting max_request_body_size to a reasonable default (instead of unlimited)?

Oct 25 '19 15:10 calebrob6

Thanks, @calebrob6!

Do you think you could come up with an actual test case for this? Even adding a failing test it great (we can mark it as xfail with a link to the issue) because this way we'd have something actionable to fix. The second step would be removing the xfail marker and adding the fix implementation.

Adjusting max_request_body_size sounds like a good idea and maybe we could add a warning in case if user explicitly sets it to "unlimited" and MemoryError will still happen.

Oct 26 '19 19:10 webknjaz

Also, take into account that MemoryError is recoverable so we may want to implement some way of processing whatever is already in memory or reading the data by chunks and disposing the previous chunk each time we read a new one.

Oct 26 '19 19:10 webknjaz

Reproducible @ https://app.circleci.com/pipelines/github/cherrypy/cheroot/241/workflows/bb64e1f6-69d1-40f3-95c9-7ee7271213c5/jobs/2454/parallel-runs/2/steps/2-103

Jul 08 '20 14:07 webknjaz

I've found out that using sys.maxsize+1 reproduces an OverflowError and sys.maxsize reproduces MemoryError. The underlying read() call explodes in C-land so one thing we could do is to attempt reading smaller chunks in a loop but this will probably cause timeouts for sufficiently big payload and also will consume that much memory. So we should shift the default from unlimited to something sane. And also, we need some try/except to basically catch the cases where the users would reset it to "unlimited" and hit this problem anyway.

Nov 30 '20 21:11 webknjaz