scylla-manager icon indicating copy to clipboard operation
scylla-manager copied to clipboard

Fix restore schema procedure

Open Michal-Leszczynski opened this issue 1 year ago • 4 comments

Restoring schema doesn't work with (or at least isn't stable) with Scylla 5.4.0 with consistent_cluster_management: true. This resulted in creating this issue which exposed that SM is restoring schema in a dangerous way. Restore schema procedure should be changed to:

PREREQUISITE:

  • backup
  • fresh cluster (can have different topology than the backup)
  • ALL nodes have access to ALL backup locations. This means that in multi-location backup setting, nodes need to have access to more than one location

PROCEDURE:

  • add cluster to SM (user)
  • stop the whole cluster - all nodes are down, but agents are up (user)
  • download schema SSTables from ALL locations to ALL nodes DATA dir - not upload like before, so we can skip nodetool refresh (SM)
  • rolling-start the cluster (user)
  • ALTER cluster schema to match the new topology (e.g. if new cluster contains additional dc, add it to keyspace replication strategy)

The only potential problem is to extend agent configuration, so that it is possible for agent to have access to more than 1 location from given provider and make sure that it works well with rclone.

Michal-Leszczynski avatar Dec 14 '23 11:12 Michal-Leszczynski

https://manager.docs.scylladb.com/stable/sctool/backup.html has an option to restore a single table or a single keyspace. How is the procedure above providing the implementation of restore for this option?

kostja avatar Jan 03 '24 13:01 kostja

Option of restoring a single table/keyspace is only available for restoring user data. Schema restoration always targets the whole backed-up schema (because we just re-upload all system_schema sstables).

So implementing schema restoration via DESC SCHEMA would also allow SM to perform a partial schema restore.

Michal-Leszczynski avatar Jan 05 '24 09:01 Michal-Leszczynski

@Michal-Leszczynski, to summarize, we have two tracks for fixing this issue in Scylla Manager

  • Short term: for 5.4 and 2024.1, disable Raft on cluster init, restore it, and then enable it.
  • Long-term: using a new DESCRIBE schema

Both should be automated, tested, and documented.

tzach avatar Jan 23 '24 06:01 tzach

Short term: for 5.4 and 2024.1, disable Raft on cluster init, restore it, and then enable it.

Restoring schema from cluster with raft to the cluster without raft is now a part of SM gh actions tests, but enabling raft afterwards isn't, so it should be added.

Long-term: using a new DESCRIBE schema

Correct, but first we need to have DESCRIBE SCHEMA that can be safely used for restoring data (DESCRIBE SCHEMA WITH INTERNALS doesn't work in its current form). When it's available (e.g. as gocql driver method) we can use it so that:

  • restoring schema of a single keyspace / table is possible
  • restoring schema doesn't require cluster restart afterwards

Michal-Leszczynski avatar Jan 23 '24 09:01 Michal-Leszczynski

refinement notes

The workaround is documented in the docs here https://manager.docs.scylladb.com/stable/restore/restore-schema.html#restoring-schema-into-a-cluster-with-scylladb-5-4-x-or-2024-1-x-with-consistent-cluster-management

This issue is not referring to 6.0 and the schema restoration done with the DESCRIBE SCHEMA. The issue that tracks schema restoration with 6.0 is here https://github.com/scylladb/scylla-manager/issues/3868

The current one is just about the workaround which is already documented, so we can close this one.

karol-kokoszka avatar Jun 10 '24 09:06 karol-kokoszka