scylla-manager icon indicating copy to clipboard operation
scylla-manager copied to clipboard

Do not compare SSTable checksums for 2024.1 backups

Open karol-kokoszka opened this issue 9 months ago • 1 comments

Scylla Enterprise 2024.1 introduced as a default support for UUIDs in SSTable names. https://github.com/scylladb/scylladb/pull/13932

Scylla manager no longer needs to compare checksums when copying files from local to storage, as the SSTable uniqueness is guaranteed by the UUID and the situation where SSTable have the same name and size, but different content is mitigated.

The above is valid for backups made on >= 2024.1 only. We still need to support old approach for older Scyllas.

Issue on QA side https://github.com/scylladb/qa-tasks/issues/1397

More details of why we have to add this support is here https://github.com/scylladb/scylla-enterprise/issues/4126#issuecomment-2075049181

karol-kokoszka avatar Apr 25 '24 13:04 karol-kokoszka

grooming notes

We must check the version of Scylla that we perform the backup task against. If the version uses UUID in the SSTable names, (>= 2024.1) then we must disable checksum comparision during the move operation performed by RClone.

(per @Michal-Leszczynski) We can check at runtime if the SSTable name includes the UUID or not. If it's not, then we must enable checksum, if it is, we must disable.

karol-kokoszka avatar Apr 29 '24 08:04 karol-kokoszka

Update:

SSTable specification includes .crc32 file which is actually the checksum created by scylla server. It's enough to compare the content of .crc32 inlcuded into created snapshot for every SSTable to the content of its remote equivalent. If they are equal, then just remove the SSTable from snapshot dir.

The solution above doesn't require to have Scylla supporting UUIDs in the SSTable names. It's just a general solution.

SSTable specification https://opensource.docs.scylladb.com/stable/architecture/sstable/sstable3/sstables-3-data-file-format.html

karol-kokoszka avatar Jun 19 '24 12:06 karol-kokoszka

We need to test the scenario described in the previous comment against the snapshot that contains hundreds of SSTables to see the performance of this operation. It includes reading content of every crc32 file that is part of the current snapshot.

karol-kokoszka avatar Jun 24 '24 09:06 karol-kokoszka

RFC for the new deduplication stage: https://docs.google.com/document/d/1EtGlF6UGNy34D_7QsnCheaukp3UwVObZU56PBdd0CQ8/edit?usp=sharing

karol-kokoszka avatar Jul 04 '24 14:07 karol-kokoszka