scylla-manager
scylla-manager copied to clipboard
Snapshot parallelism should have per-table granularity
Consider an overprovisioned cluster with 9 nodes a 1 single user keyspace to backup. In particular, the data set is small enough that it can fit in memory, but large enough to cause I/O pressure when memtable flushes occur as flushes won't happen often.
The --snapshot-parallel
option will essentially flush with a keyspace granularity, which is enough to cause a severe latency spike for actively used tables that are not flushed very often. For example, this is a situation seen which caused write latencies to get over 10x:
The ideal solution to this would be for Scylla to flush memtables slow enough to not overwhelm the latencies as it is seem, but - in addition - Scylla Manager should also have an option to flush per table - rather than keyspace - so that the database knows how to properly handle such flushes.
Manager is taking the snapshot per keyspace. All tables are included into the call to Scylla API.
https://github.com/scylladb/scylla-manager/blob/e9829c2535355268cc5a1cc8b19d304b63a8583e/pkg/service/backup/worker_snapshot.go#L81-L88
We can change it to work on the single table level.