Memory-scaling a cluster with a single node causes downtime
Given
- Any simple Fly app.
- That runs on a cluster with a single instance. In other words:
scale count == 1. - No volumes
- No special scaling settings in the
fly.tomlnor by callingfly scale .... Everything is at stock settings.
These preconditions are trivially satisfied by fly launching any of the example apps in the Fly documentation.
For instance, with the Rails example app.
When
When running fly scale memory some_other_amount (such as fly scale memory 512)
Expected behaviour
A 'canary'-style deploy is done. That is:
- A new VM instance is started in the background.
- Fly waits until it is fully started and its health checks are OK
- Only now is the current VM instance stopped and traffic redirected to the new VM instance.
Actual behaviour
An 'immediate'-style deploy is done. That is:
- The current VM instance is stopped immediately, before the new VM instance is operational.
- This causes a period of downtime. (The downtime takes roughly as long as it takes the new VM instance to boot and pass its health checks.)
After @davissp14 confirmed that this is not intended behaviour in the Forum topic about this problem, formalizing it in an issue seemed like the right thing to do. I hope this aligns with Fly's dev guidelines :blush: .
@davissp14 are you still working on this internally? Can I help here?
More than a year and no updates about a bug that causes downtime to your users?