scylla-manager icon indicating copy to clipboard operation
scylla-manager copied to clipboard

Snapshot parallelism should have per-table granularity

Open fee-mendes opened this issue 2 years ago • 1 comments

Consider an overprovisioned cluster with 9 nodes a 1 single user keyspace to backup. In particular, the data set is small enough that it can fit in memory, but large enough to cause I/O pressure when memtable flushes occur as flushes won't happen often.

The --snapshot-parallel option will essentially flush with a keyspace granularity, which is enough to cause a severe latency spike for actively used tables that are not flushed very often. For example, this is a situation seen which caused write latencies to get over 10x:

image

The ideal solution to this would be for Scylla to flush memtables slow enough to not overwhelm the latencies as it is seem, but - in addition - Scylla Manager should also have an option to flush per table - rather than keyspace - so that the database knows how to properly handle such flushes.

fee-mendes avatar Sep 02 '22 14:09 fee-mendes

Manager is taking the snapshot per keyspace. All tables are included into the call to Scylla API.

https://github.com/scylladb/scylla-manager/blob/e9829c2535355268cc5a1cc8b19d304b63a8583e/pkg/service/backup/worker_snapshot.go#L81-L88

We can change it to work on the single table level.

karol-kokoszka avatar Jul 10 '23 15:07 karol-kokoszka