polis icon indicating copy to clipboard operation
polis copied to clipboard

local polis-file-server sometimes fails to serve files; polis-server crashes

Open midgleyc opened this issue 4 years ago • 5 comments

Expected behavior: Polis-file-server always serves static files, or polis-server can handle a single connection drop. Polis-server does not crash.

Actual behavior: Polis-file-server sometimes drops the connection; polis-server crashes

To Reproduce: Deploy both polis-server and polis-file-server in production, running polis-file-server as 'local' upload. Wait several hours, then access a page on polis-server.

Screenshots: image

Device information:

  • AWS t2.small running docker-compose

Additional context: Logs from polis-server:

part2
{}
{
  'x-forwarded-proto': 'https',
  host: 'polis.client.newredo.com',
  connection: 'close',
  'user-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0',
  accept: 'image/webp,*/*',
  'accept-language': 'en-GB,en;q=0.7,en-US;q=0.3',
  'accept-encoding': 'gzip, deflate, br',
  referer: 'https://polis.client.newredo.com/',
  cookie: 'REDACTED'
}
/app/node_modules/http-proxy/lib/http-proxy/index.js:120
    throw err;
    ^

Error: socket hang up
    at connResetException (internal/errors.js:617:14)
    at Socket.socketCloseListener (_http_client.js:443:25)
    at Socket.emit (events.js:327:22)
    at Socket.EventEmitter.emit (domain.js:486:12)
    at TCP.<anonymous> (net.js:673:12)
    at TCP.callbackTrampoline (internal/async_hooks.js:129:14) {
  code: 'ECONNRESET'
}

After this polis-server crashes. I have the docker container set to restart: unless-stopped, and it comes back up after that so the next bit is just init 1. The page loads on refresh.

midgleyc avatar Mar 09 '21 09:03 midgleyc

The cause of this is that routingProxy sometimes raises an ECONNRESET. Because there are no listeners -- e.g.:

routingProxy.on('error', (e) => {
...
})

this error propagates and crashes the application.

I think this is https://github.com/http-party/node-http-proxy/issues/1455

midgleyc avatar Mar 17 '21 07:03 midgleyc

This is great sleuthing @midgleyc! much appreciated! Sorry, there are not many outside people trying to host Polis in production-like environments.

@joshsmith2 are you hitting something like this as well?

patcon avatar Mar 17 '21 14:03 patcon

@patcon The application does not terminate as such but instead it logs the error, then keeps running with the broken socket and does not respond to further requests. It's unclear why the process does not terminate but perhaps the use of setInterval and long setTimeout calls is related. As the process does not terminate, application supervisors (like docker) don't know to restart it.

I'd like to fix this by making sure the application terminates properly, with an error code, as this will provide a wide solution covering other failure scenarios. Any objections or other ideas?

pluby avatar Feb 08 '22 11:02 pluby

No objection from me, but to be clear, I don't have merge access directly. I can't imagine the polis team would have issue so long as the solution is scoped and as minimal as feasible :) Feel free to talk it out loud here

patcon avatar Feb 09 '22 06:02 patcon

I don't have much to add [other than my appreciation to those who have discussed beforehand], but wanted to confirm that we are hitting this issue on our polis deployment in what appears to be a similar configuration to @midgleyc. Happy to contribute to some possible fixes.

edenist avatar Aug 31 '22 06:08 edenist