scylla-manager
scylla-manager copied to clipboard
Fix restore schema procedure
Restoring schema doesn't work with (or at least isn't stable) with Scylla 5.4.0 with consistent_cluster_management: true
. This resulted in creating this issue which exposed that SM is restoring schema in a dangerous way. Restore schema procedure should be changed to:
PREREQUISITE:
- backup
- fresh cluster (can have different topology than the backup)
- ALL nodes have access to ALL backup locations. This means that in multi-location backup setting, nodes need to have access to more than one location
PROCEDURE:
- add cluster to SM (user)
- stop the whole cluster - all nodes are down, but agents are up (user)
- download schema SSTables from ALL locations to ALL nodes DATA dir - not upload like before, so we can skip
nodetool refresh
(SM) - rolling-start the cluster (user)
- ALTER cluster schema to match the new topology (e.g. if new cluster contains additional dc, add it to keyspace replication strategy)
The only potential problem is to extend agent configuration, so that it is possible for agent to have access to more than 1 location from given provider and make sure that it works well with rclone.
https://manager.docs.scylladb.com/stable/sctool/backup.html has an option to restore a single table or a single keyspace. How is the procedure above providing the implementation of restore for this option?
Option of restoring a single table/keyspace is only available for restoring user data. Schema restoration always targets the whole backed-up schema (because we just re-upload all system_schema sstables).
So implementing schema restoration via DESC SCHEMA would also allow SM to perform a partial schema restore.
@Michal-Leszczynski, to summarize, we have two tracks for fixing this issue in Scylla Manager
- Short term: for 5.4 and 2024.1, disable Raft on cluster init, restore it, and then enable it.
- Long-term: using a new DESCRIBE schema
Both should be automated, tested, and documented.
Short term: for 5.4 and 2024.1, disable Raft on cluster init, restore it, and then enable it.
Restoring schema from cluster with raft to the cluster without raft is now a part of SM gh actions tests, but enabling raft afterwards isn't, so it should be added.
Long-term: using a new DESCRIBE schema
Correct, but first we need to have DESCRIBE SCHEMA that can be safely used for restoring data (DESCRIBE SCHEMA WITH INTERNALS
doesn't work in its current form). When it's available (e.g. as gocql driver method) we can use it so that:
- restoring schema of a single keyspace / table is possible
- restoring schema doesn't require cluster restart afterwards
refinement notes
The workaround is documented in the docs here https://manager.docs.scylladb.com/stable/restore/restore-schema.html#restoring-schema-into-a-cluster-with-scylladb-5-4-x-or-2024-1-x-with-consistent-cluster-management
This issue is not referring to 6.0 and the schema restoration done with the DESCRIBE SCHEMA. The issue that tracks schema restoration with 6.0 is here https://github.com/scylladb/scylla-manager/issues/3868
The current one is just about the workaround which is already documented, so we can close this one.