scylla-manager
scylla-manager copied to clipboard
Missing API to fetch schema used during the backup creation
Version: 2.6
Description We are missing an API to get the schema used during the creation of the backup. Schema is required in order to execute the restore procedure described here: https://docs.scylladb.com/operating-scylla/procedures/backup-restore/restore/#procedure
@vladzcloudius
We are missing an API to get the schema used during the creation of the API.
I don't understand this requirement. By missing an API to get schema
you mean another sctool command to get the schema in CQL ?
Schema should be dumped whenever the backup is started to have the it from the exact point in time.
Schema is added to the backup if, and only if, the user provides credentials to the cluster added to the scylla-manager. See https://manager.docs.scylladb.com/stable/backup/specification.html#schema
Even if we add the API to dump the schema, there is still a need for having credentails provided to the cluster.
@vladzcloudius
We are missing an API to get the schema used during the creation of the API.
I don't understand this requirement. By
missing an API to get schema
you mean another sctool command to get the schema in CQL ? Schema should be dumped whenever the backup is started to have the it from the exact point in time.Schema is added to the backup if, and only if, the user provides credentials to the cluster added to the scylla-manager. See https://manager.docs.scylladb.com/stable/backup/specification.html#schema
Even if we add the API to dump the schema, there is still a need for having credentails provided to the cluster.
@karol-kokoszka schema is a requirement for the restoration. Backup is meaningless if you can't use it. And if credential are a requirement for getting schema (which they are if authentication is enabled!) then you must require them when backup task is created.
IMO it's very obvious.
Please, let me know if that makes sense?
SM always backups schema in the form of system_schema
sstables and this does not require cluster credentials.
I believe that this has been introduced in SM 2.4 because we can't always trust the output of DESCRIBE SCHEMA
:
// Always backup system_schema.
//
// Some schema changes, like dropping columns, are applied lazily to
// sstables during compaction. Information about those schema changes is
// recorded in the system schema tables, but not in the output of "desc schema".
// Using output of "desc schema" is not enough to restore all schema changes.
// As a result, writes in sstables may be incorrectly interpreted.
// For example, writes of deleted columns which were later recreated may be
// resurrected.
Both ansible script and SM 3.1 restore task restore schema from system_schema
sstables, not the DESCRIBE SCHEMA
cql file.
@vladzcloudius We don't want to invest the effort into ansible way of restoring the cluster with manager >= 3.1 .
@Michal-Leszczynski I think it was always the CQL that was used to restore the schema with ansible.
Here is the scylla documentation describing the restore. It mentions Scylla-Manager as well. https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/backup-restore/restore.html
Here is a fragment of restore anisble script:
- name: Load system_schema tables data from the upload directory
shell: |
nodetool refresh {{ item.split('.') | join(' ') }}
with_items: "{{ system_schema_tables }}"
which shows, that schema is restored via sstables.
SM always backups schema in the form of
system_schema
sstables and this does not require cluster credentials. I believe that this has been introduced in SM 2.4 because we can't always trust the output ofDESCRIBE SCHEMA
:// Always backup system_schema. // // Some schema changes, like dropping columns, are applied lazily to // sstables during compaction. Information about those schema changes is // recorded in the system schema tables, but not in the output of "desc schema". // Using output of "desc schema" is not enough to restore all schema changes. // As a result, writes in sstables may be incorrectly interpreted. // For example, writes of deleted columns which were later recreated may be // resurrected.
Both ansible script and SM 3.1 restore task restore schema from
system_schema
sstables, not theDESCRIBE SCHEMA
cql file.
I'm well aware of that, @Michal-Leszczynski.
And this GH issue is about adding an API to do fetch the schema in the DESCRIBE SCHEMA
form.
@vladzcloudius We don't want to invest the effort into ansible way of restoring the cluster with manager >= 3.1 .
How is it relevant to this GH issue, @karol-kokoszka?
@Michal-Leszczynski I think it was always the CQL that was used to restore the schema with ansible.
No, it was always the system_schema
sstables, @karol-kokoszka.
Here is the scylla documentation describing the restore. It mentions Scylla-Manager as well. https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/backup-restore/restore.html
@karol-kokoszka @Michal-Leszczynski the context of this GH issue is not a full cluster cloning (as you both seem to imply) but a restoration of a single/multiple table(s).
The doc above describes this process but it requires actual schema, Michal, not system_schema sstables.
Ref https://github.com/scylladb/scylladb/issues/16482