autoscaling
autoscaling copied to clipboard
Check what happens to non-overlayed connections during migration
Storage is currently accessed over default network, so connections expected to be broken when the switch happens. We need to check how compute/pageserver/safekeeper handles it.
DoD
We have to have numbers showing how quickly can compute reestablish connection and how much throughput is impacted.
More info
If we wanted to access pageserver through overlay network, we would need either:
- To bring overlay network to each PS host, so pageservers would have to be in k8s.
- To setup bridges, which would have to decap/encap packets on the route between pagerserver and compute.
I'm currently writing a script that restarts computes and sends a lot of queries. This script uses branches of a single projects – it makes it possible to easily test overlay on staging without enabling it for everyone. The idea is that script will give us a good reproduction of the issues that we're seeing in other tests after enabling overlay network.
The source code is here: https://github.com/neondatabase-labs/overlay-testing
This issue was moved to Jira: LKB-1018