scylla-manager icon indicating copy to clipboard operation
scylla-manager copied to clipboard

Missing API to fetch schema used during the backup creation

Open vladzcloudius opened this issue 3 years ago • 8 comments

Version: 2.6

Description We are missing an API to get the schema used during the creation of the backup. Schema is required in order to execute the restore procedure described here: https://docs.scylladb.com/operating-scylla/procedures/backup-restore/restore/#procedure

vladzcloudius avatar Feb 06 '22 20:02 vladzcloudius

@vladzcloudius

We are missing an API to get the schema used during the creation of the API.

I don't understand this requirement. By missing an API to get schema you mean another sctool command to get the schema in CQL ? Schema should be dumped whenever the backup is started to have the it from the exact point in time.

Schema is added to the backup if, and only if, the user provides credentials to the cluster added to the scylla-manager. See https://manager.docs.scylladb.com/stable/backup/specification.html#schema

Even if we add the API to dump the schema, there is still a need for having credentails provided to the cluster.

karol-kokoszka avatar Jul 05 '23 07:07 karol-kokoszka

@vladzcloudius

We are missing an API to get the schema used during the creation of the API.

I don't understand this requirement. By missing an API to get schema you mean another sctool command to get the schema in CQL ? Schema should be dumped whenever the backup is started to have the it from the exact point in time.

Schema is added to the backup if, and only if, the user provides credentials to the cluster added to the scylla-manager. See https://manager.docs.scylladb.com/stable/backup/specification.html#schema

Even if we add the API to dump the schema, there is still a need for having credentails provided to the cluster.

@karol-kokoszka schema is a requirement for the restoration. Backup is meaningless if you can't use it. And if credential are a requirement for getting schema (which they are if authentication is enabled!) then you must require them when backup task is created.

IMO it's very obvious.

Please, let me know if that makes sense?

vladzcloudius avatar Jul 05 '23 22:07 vladzcloudius

SM always backups schema in the form of system_schema sstables and this does not require cluster credentials. I believe that this has been introduced in SM 2.4 because we can't always trust the output of DESCRIBE SCHEMA:

	// Always backup system_schema.
	//
	// Some schema changes, like dropping columns, are applied lazily to
	// sstables during compaction. Information about those schema changes is
	// recorded in the system schema tables, but not in the output of "desc schema".
	// Using output of "desc schema" is not enough to restore all schema changes.
	// As a result, writes in sstables may be incorrectly interpreted.
	// For example, writes of deleted columns which were later recreated may be
	// resurrected.

Both ansible script and SM 3.1 restore task restore schema from system_schema sstables, not the DESCRIBE SCHEMA cql file.

Michal-Leszczynski avatar Jul 06 '23 09:07 Michal-Leszczynski

@vladzcloudius We don't want to invest the effort into ansible way of restoring the cluster with manager >= 3.1 .

@Michal-Leszczynski I think it was always the CQL that was used to restore the schema with ansible.

Here is the scylla documentation describing the restore. It mentions Scylla-Manager as well. https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/backup-restore/restore.html

karol-kokoszka avatar Jul 07 '23 14:07 karol-kokoszka

Here is a fragment of restore anisble script:

- name: Load system_schema tables data from the upload directory
  shell: |
    nodetool refresh {{ item.split('.') | join(' ') }}
  with_items: "{{ system_schema_tables }}"

which shows, that schema is restored via sstables.

Michal-Leszczynski avatar Jul 10 '23 08:07 Michal-Leszczynski

SM always backups schema in the form of system_schema sstables and this does not require cluster credentials. I believe that this has been introduced in SM 2.4 because we can't always trust the output of DESCRIBE SCHEMA:

	// Always backup system_schema.
	//
	// Some schema changes, like dropping columns, are applied lazily to
	// sstables during compaction. Information about those schema changes is
	// recorded in the system schema tables, but not in the output of "desc schema".
	// Using output of "desc schema" is not enough to restore all schema changes.
	// As a result, writes in sstables may be incorrectly interpreted.
	// For example, writes of deleted columns which were later recreated may be
	// resurrected.

Both ansible script and SM 3.1 restore task restore schema from system_schema sstables, not the DESCRIBE SCHEMA cql file.

I'm well aware of that, @Michal-Leszczynski. And this GH issue is about adding an API to do fetch the schema in the DESCRIBE SCHEMA form.

vladzcloudius avatar Jul 10 '23 21:07 vladzcloudius

@vladzcloudius We don't want to invest the effort into ansible way of restoring the cluster with manager >= 3.1 .

How is it relevant to this GH issue, @karol-kokoszka?

@Michal-Leszczynski I think it was always the CQL that was used to restore the schema with ansible.

No, it was always the system_schema sstables, @karol-kokoszka.

Here is the scylla documentation describing the restore. It mentions Scylla-Manager as well. https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/backup-restore/restore.html

@karol-kokoszka @Michal-Leszczynski the context of this GH issue is not a full cluster cloning (as you both seem to imply) but a restoration of a single/multiple table(s).

The doc above describes this process but it requires actual schema, Michal, not system_schema sstables.

vladzcloudius avatar Jul 10 '23 21:07 vladzcloudius

Ref https://github.com/scylladb/scylladb/issues/16482

mykaul avatar Jan 08 '24 15:01 mykaul