sui icon indicating copy to clipboard operation
sui copied to clipboard

Extend idle timeout to exceed that of NLB (or whatever LB we are using)

Open mystenmark opened this issue 3 years ago • 1 comments

I believe we are currently experiencing the following issue:

  • LB opens connection to backend to service request
  • LB keeps connection open for some period of time even if idle
  • backend also keeps connection open for period of time even if idle
  • if backend closes the connection first, then the next request from the LB will attempt to use the connection it still believes to be open, and it will get a TCP RST in response
  • on web stacks this usually shows up as a 502
  • solution is to make sure the backend keeps idle connections open at least as long as the LB expects it to

According to these docs https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html the idle timeout used by NLBs is 350s, and this cannot be changed.

Setting our own idle timeout to higher than 350s should resolve the issue. I am not sure where in hyper/h2/tonic this behavior can be controlled, nor do I know what the current default is.

An alternative would be to enable active http keepalives, although that may have unintended consequences, in that it may keep connections alive indefinitely that are not actually being used for requests.

mystenmark avatar Aug 19 '22 16:08 mystenmark

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Oct 19 '22 02:10 github-actions[bot]