EDDN icon indicating copy to clipboard operation
EDDN copied to clipboard

Ensure we can cope with feasible incoming `/upload/` rates

Open Athanasius opened this issue 2 years ago • 4 comments

  1. We are seeing some:

    [Wed Feb 09 18:11:49.137147 2022] [proxy:error] [pid 29546] (70007)The timeout specified has expired: [client <client ip>:61619] AH01095: prefetch request body failed to 127.0.0.1:8081 (127.0.0.1) from <client IP> ()
    
  2. I had such triggered by EDMC development code yesterday. EDMC has a 10 second requests timeout on EDDN uploads.

  3. It's possible that the default Apache config (mpm pre_fork worker, and the max/spare children, plus default timeout(s)) is causing some of this.

  4. But also it's possible that the Gateway,py bottle defaults are, e.g. not setting the TCP accept() queue large enough and/or it's not working through things fast enough (it does use gevent 'greenlets' for passing messages off after initial checks, but perhaps it needs to pass off to a thread pool ASAP ?).

  5. In order to definitely reconfigure things as the actual use requires we first need to know what our actual peak rate of /upload/ messages is. To that end we need a script to process through Apache access.log files and find the largest sliding to the fraction of a second window count of such messages. Then we have a starting point and can add safety margin.

  6. We might want to implement some Reverse Proxy rate limiting per originating IP (or perhaps uploaderID, or tuple of both).

  7. We should probably switch Reverse Proxy to whatever has the necessary features and is known to perform best. Apache was the default choice given perceived future other uses for the host and experience of the people involved.

Athanasius avatar Feb 10 '22 11:02 Athanasius

Re: no. 1&2 above.

I just checked to be sure, and this won't have been a case of me pausing EDMC in the debugger. Under normal debugging the request would be "connect, send request, get response back" all in one function call.

I did just drill down in the debugger so I could stay stopped after the connection was opened, but before any request was sent and the eddn.edcd.io access.log showed:

<client ip> - - [10/Feb/2022:17:28:35 +0000] "-" 408 597 - "-" "-"

and there was nothing in the error.log, never mind the same error as shown in this issue's OP.

Athanasius avatar Feb 10 '22 17:02 Athanasius

Still trying to track down what that error.log line means I've now been looking at apache2 source code. Searching on 1095 leads you to modules/proxy/proxy_util.c:ap_proxy_prefetch_input(). I now need to puzzle out if this is reading from the upstream client or from the downstream target server.

Athanasius avatar Feb 10 '22 18:02 Athanasius

Some more information.

Apache error.log:

[Sat Feb 12 19:57:08.340124 2022] [proxy:error] [pid 5060] (70007)The timeout specified has expired: [client <client ip1>:65288] AH02608: read request body failed to 127.0.0.1:8081 (127.0.0.1) from <client ip1> ()
[Sat Feb 12 19:57:08.340159 2022] [proxy_http:error] [pid 5060] [client 179.48.249.236:65288] AH10154: pass request body failed to 127.0.0.1:8081 (127.0.0.1) from <client ip1> () with status 408
[Sat Feb 12 20:09:49.740131 2022] [proxy:error] [pid 9673] (70007)The timeout specified has expired: [client <client ip2>:49622] AH02608: read request body failed to 127.0.0.1:8081 (127.0.0.1) from <client ip2> ()

And the Live gateway.log:

127.0.0.1 - - [2022-02-12 19:55:01] "POST /upload/ HTTP/1.1" 400 367 0.046442
Traceback (most recent call last):
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 862, in _handle
    return route.call(**args)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1740, in wrapper
    rv = callback(*a, **ka)
  File "build/bdist.linux-x86_64/egg/eddn/Gateway.py", line 364, in _enable_cors
    return fn(*args, **kwargs)
  File "build/bdist.linux-x86_64/egg/eddn/Gateway.py", line 290, in upload
    message_body = get_decompressed_message()
  File "build/bdist.linux-x86_64/egg/eddn/Gateway.py", line 196, in get_decompre
ssed_message
    data_key = request.forms.get('data')
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 166, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1084, in forms
    for name, item in self.POST.allitems():
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 166, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1218, in POST
    pairs = _parse_qsl(tonat(self._get_body_string(), 'latin1'))
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1185, in _get_body_s
tring
    data = self.body.read(clen)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1197, in body
    self._body.seek(0)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 166, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1166, in _body
    for part in body_iter(read_func, self.MEMFILE_MAX):
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1129, in _iter_body
    part = read(min(maxread, bufsize))
  File "/home/eddn/live/python2.7-venv/local/lib/python2.7/site-packages/gevent/
pywsgi.py", line 316, in read
    return self._do_read(length)
  File "/home/eddn/live/python2.7-venv/local/lib/python2.7/site-packages/gevent/
pywsgi.py", line 195, in _do_read
    raise IOError("unexpected end of file while reading request at position %s"
% (self.position,))
IOError: unexpected end of file while reading request at position 0
127.0.0.1 - - [2022-02-12 19:57:08] "POST /upload/ HTTP/1.1" 500 136 10.008121
2022-02-12 19:57:25,663 - ERROR - Gateway:274: Failed Validation "[<ValidationEr
ror: '[] is too short'>]" (322, "Raxiel Silverpath", "EDDiscovery", "12.1.7.0",
"https://eddn.edcd.io/schemas/outfitting/2", "-") from <unrelated ip>
...
127.0.0.1 - - [2022-02-12 20:08:53] "POST /upload/ HTTP/1.1" 400 367 0.200139
Traceback (most recent call last):
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 862, in _handle
    return route.call(**args)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1740, in wrapper
    rv = callback(*a, **ka)
  File "build/bdist.linux-x86_64/egg/eddn/Gateway.py", line 364, in _enable_cors
    return fn(*args, **kwargs)
  File "build/bdist.linux-x86_64/egg/eddn/Gateway.py", line 290, in upload
    message_body = get_decompressed_message()
  File "build/bdist.linux-x86_64/egg/eddn/Gateway.py", line 196, in get_decompressed_message
    data_key = request.forms.get('data')
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 166, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1084, in forms
    for name, item in self.POST.allitems():
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 166, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1218, in POST
    pairs = _parse_qsl(tonat(self._get_body_string(), 'latin1'))
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1185, in _get_body_string
    data = self.body.read(clen)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1197, in body
    self._body.seek(0)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 166, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1166, in _body
    for part in body_iter(read_func, self.MEMFILE_MAX):
  File "/home/eddn/live/python2.7-venv/bin/bottle.py", line 1129, in _iter_body
    part = read(min(maxread, bufsize))
  File "/home/eddn/live/python2.7-venv/local/lib/python2.7/site-packages/gevent/pywsgi.py", line 316, in read
    return self._do_read(length)
  File "/home/eddn/live/python2.7-venv/local/lib/python2.7/site-packages/gevent/pywsgi.py", line 195, in _do_read
    raise IOError("unexpected end of file while reading request at position %s" % (self.position,))
IOError: unexpected end of file while reading request at position 0
127.0.0.1 - - [2022-02-12 20:09:49] "POST /upload/ HTTP/1.1" 500 136 10.008273
2022-02-12 20:11:02,873 - ERROR - Gateway:274: Failed Validation "[<ValidationError: '[] is too short'>]" (310, "Insane McManiac", "EDDiscovery", "12.1.7.0", "https://eddn.edcd.io/schemas/shipyard/2", "-") from <unrelated ip>

So, bottle thinks it's seeing EOF on body read, and Apache is saying ... it couldn't read from the remote client ? These could literally be a Sender not sending the body in time, or disconnecting before they do ?

Athanasius avatar Feb 12 '22 21:02 Athanasius

Reading through https://tech.affirm.com/profiling-gevent-in-python-8c719cba6561 it's possible we're falling foul of the Python GIL / this not being full-fat multi-threading ? Worth putting together a VM testbench (i.e. not on the EDCD host) to see if it can be reproduced.

In particular we have the navroute schema events now, which can not only be much larger in size than was typical before their addition (and remember there's compression possibly in play versus the configured limit), but also will have a "lot of structure" for the validation to check through.

Athanasius avatar Oct 03 '22 14:10 Athanasius