flyctl icon indicating copy to clipboard operation
flyctl copied to clipboard

Memory-scaling a cluster with a single node causes downtime

Open Qqwy opened this issue 3 years ago • 2 comments

Given

  • Any simple Fly app.
  • That runs on a cluster with a single instance. In other words: scale count == 1.
  • No volumes
  • No special scaling settings in the fly.toml nor by calling fly scale .... Everything is at stock settings.

These preconditions are trivially satisfied by fly launching any of the example apps in the Fly documentation. For instance, with the Rails example app.

When

When running fly scale memory some_other_amount (such as fly scale memory 512)

Expected behaviour

A 'canary'-style deploy is done. That is:

  • A new VM instance is started in the background.
  • Fly waits until it is fully started and its health checks are OK
  • Only now is the current VM instance stopped and traffic redirected to the new VM instance.

Actual behaviour

An 'immediate'-style deploy is done. That is:

  • The current VM instance is stopped immediately, before the new VM instance is operational.
  • This causes a period of downtime. (The downtime takes roughly as long as it takes the new VM instance to boot and pass its health checks.)

After @davissp14 confirmed that this is not intended behaviour in the Forum topic about this problem, formalizing it in an issue seemed like the right thing to do. I hope this aligns with Fly's dev guidelines :blush: .

Qqwy avatar Aug 16 '22 11:08 Qqwy

@davissp14 are you still working on this internally? Can I help here?

redjonzaci avatar Sep 03 '23 14:09 redjonzaci

More than a year and no updates about a bug that causes downtime to your users?

rubn-g avatar Nov 04 '23 09:11 rubn-g