polkadot icon indicating copy to clipboard operation
polkadot copied to clipboard

"Rejected connection: Transport(i/o error: unexpected end of file" in logs after v0.9.28 upgrade

Open juliajang opened this issue 1 year ago • 5 comments

Hi team! I have validators for Polkadot and Kusama and I'm seeing this error log for both Kusama and Polkadot after upgrading to v0.9.28

2022-09-01 15:42:10 Accepting new connection 4/100
2022-09-01 15:42:10 Rejected connection: Transport(i/o error: unexpected end of file

Caused by:
    unexpected end of file)
2022-09-01 15:42:12 ✨ Imported #14263391 (0x2a5f…6fbd)

and these are the flags that are passed when I start my validator

polkadot --base-path /chain/data --chain  kusama --rpc-cors=all --unsafe-rpc-external --unsafe-ws-external --port "40333" --pruning=archive

I'm wondering if I'm missing a flag that is needed or any changes as this only happens after the upgrade to v0.9.28 and not in previous version (ie. v0.9.27 does not show these logs)

juliajang avatar Sep 01 '22 16:09 juliajang

Cc @niklasad1

bkchr avatar Sep 02 '22 07:09 bkchr

Hey,

It indeed seems like a bug.

Can you explain how you run your node such as behind a nginx proxy/load balancer or something similar? I have seen something similar on a few nodes but I haven't been able to produce it.

This a socket error that occurs when trying to complete the WS handshake but nothing has really changed regarding that what I can see in the release but could be a regression in jsonrpsee v0.15.1.

niklasad1 avatar Sep 02 '22 07:09 niklasad1

@niklasad1 We run our servers behind a fleet of proxy servers, which sit behind a cloud-hosted Layer 4 load balancer

juliajang avatar Sep 02 '22 16:09 juliajang

right, it will be hard for me to try to reproduce that locally.

do you have any idea how to reproduce this or any additional logs to share?

niklasad1 avatar Sep 03 '22 10:09 niklasad1

Hey again, I looked at the code again and versions <= polkadot v0.9.27 then we never logged when a connection request failed so the behavior is probably the same as polkadot v0.9.28 it could just be that the client just dropped the connection directly after opening it (but I'm not sure trying to reproduce that myself)

For instance it may be that you have some health check on the WebSocket server and we reply with HTTP status code 403 for any request that isn't an HTTP upgrade request after v0.15.1

See https://github.com/paritytech/jsonrpsee/issues/818 for further information.

niklasad1 avatar Sep 06 '22 15:09 niklasad1

We are seeing this very consistantly on our Burnin machines on Westend (burnin for v0.9.30-rc3).

  • raw: https://grafana.parity-mgmt.parity.io/goto/CWeiFKIVk?orgId=1
  • rate: https://grafana.parity-mgmt.parity.io/goto/-viSKKS4z?orgId=1

chevdor avatar Oct 13 '22 11:10 chevdor

yeah but the hypothesis is that these HTTP health checks on websocket server as it happens periodically (every 10th seconds or something like that)

These will go away in the next jsonrpsee release anyway which will be a server that support WS and HTTP on the same socket.

niklasad1 avatar Oct 13 '22 11:10 niklasad1

Mmm, looking at a single validator from the logs above, you can see exactly one message every 10 seconds, which is very periodic and would indeed imply to me some automated check that's misconfigured.

jsdw avatar Oct 13 '22 12:10 jsdw

@juliajang could you hint a bit on your infra ? Are you using K8s ?

chevdor avatar Oct 13 '22 12:10 chevdor

I opened this Issue https://github.com/rerun-io/ewebsock/issues/5 because I am seeing this error when I try to connect from the ewebsock WebSocket library. I'm running polkadot 0.9.29-94078b44fb6 with the following command polkadot --chain westend-dev --alice --tmp --rpc-cors all --unsafe-ws-external. I am able to connect to the same node using a JavaScript WebSocket instance.

danforbes avatar Oct 21 '22 21:10 danforbes