autoscaling icon indicating copy to clipboard operation
autoscaling copied to clipboard

Check what happens to non-overlayed connections during migration

Open olegbbtr opened this issue 7 months ago • 2 comments

Storage is currently accessed over default network, so connections expected to be broken when the switch happens. We need to check how compute/pageserver/safekeeper handles it.

DoD

We have to have numbers showing how quickly can compute reestablish connection and how much throughput is impacted.

More info

If we wanted to access pageserver through overlay network, we would need either:

  • To bring overlay network to each PS host, so pageservers would have to be in k8s.
  • To setup bridges, which would have to decap/encap packets on the route between pagerserver and compute.

olegbbtr avatar Apr 22 '25 16:04 olegbbtr

I'm currently writing a script that restarts computes and sends a lot of queries. This script uses branches of a single projects – it makes it possible to easily test overlay on staging without enabling it for everyone. The idea is that script will give us a good reproduction of the issues that we're seeing in other tests after enabling overlay network.

The source code is here: https://github.com/neondatabase-labs/overlay-testing

petuhovskiy avatar Apr 28 '25 16:04 petuhovskiy

This issue was moved to Jira: LKB-1018

zenithdb-bot-dev[bot] avatar Jul 18 '25 22:07 zenithdb-bot-dev[bot]