scylla-manager
scylla-manager copied to clipboard
Increase intensity over limit
This is an experimental changed requested in https://github.com/scylladb/scylla-enterprise/issues/4006#issuecomment-1999914876
// Set intensity to max of:
// - 10 * max_ranges_in_parallel
// - 10% of replica set ranges
cnt = max(10*cnt, NewIntensity(len(allRanges)/10))
Previously we were sending 1 range per request on our test env setup, now it goes to 10/17 depending on keyspace replication.
Note that this is 10% of replica set ranges
, so not 10% of all table ranges
as I though the latter could be too much and I didn't get which should be taken from mentioned issue comment. That's why it is also maxed against 10 * max_ranges_in_parallel
, so that it is always noticeably bigger than before.
Let me know if I should use 10% of all table ranges
instead or is it good enough for testing performance improvement.
CC: @karol-kokoszka @vladzcloudius
@vladzcloudius do you still need the version of manager extending the intensity ? If so, I can bake it with jenkins using https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/
I decided to go ahead and implement #3790 as a part of this PR (add --max-jobs-per-host
flag).
I did it because I strongly suspect that #3790 is the main reason why repair is so slow on bigger clusters.
I'm not pushing for this exact implementation or even introducing it as a separate flag (it was mentioned that it could be integrated into the parallel flag). I decided to implement it like this, because it was the simplest from SM perspective.
The --max-jobs-per-host
works only when --parallel=0
as it doesn't make much sense to have many repair jobs on a single node while the others are idle.
So only the first commit (a879b87e) from this branch does what was requested - it removes intensity capping to max_repair_ranges_in_parallel
. The rest of this PR adds --max-jobs-per-host
flag. The meaning of this flag is rather obvious, but I will describe how it interacts with intensity.
When intensity is set to X, SM will try to saturate all nodes in the cluster by repairing X token ranges in parallel on each of them. SM will try to achieve this saturation with as little jobs as possible, e.g.
- replica set {node1, node2, node3}, intensity=10, max-jobs-per-host=10, amount of not yet repaired token ranges owned by replica set = 100 -> SM will schedule just a single repair job with 10 ranges
- replica set {node1, node2, node3}, intensity=10, max-jobs-per-host=10, amount of not yet repaired token ranges owned by replica set = 5 -> SM will schedule repair job with 5 ranges and will try to saturate the remaining 5 ranges on each of the nodes from replica set by scheduling other jobs overlapping with this replica set (e.g. by a job on {node2, node3, node4} with 5 ranges and a job on {node1, node5, node6} with 5 ranges as well)
So --max-jobs-per-host
helps only when the average amount of ranges owned by replica set is smaller than the intensity.
The default behavior is exactly the same as without this flag. The default value for this flag is 1.
This PR does not make any changes to SM DB schema, so there is no problem with rolling back from this SM build to the proper version.
One can check the --max-jobs-per-host
of a running repair task in its progress display (note that it will always show 1 for an already finished task).
Also, this flag has been added to the repair control
command, so it's possible to play with it on a running repair (although note that changes made via repair control
are not persistent on repair task retry like in case of intensity/parallel).
@karol-kokoszka PR is ready for review!
@vladzcloudius please take a look at the longer comment above and decide if this idea fits you. The build for this branch can be obtained by running build job on scylla manager jenkins.
@Michal-Leszczynski I'm happy to review, but let @vladzcloudius to put his opinion about the proposal. Maybe it's not worth it.
@karol-kokoszka even if @vladzcloudius does not want to use this version, I would still be interested to see how it behaves in a testing env with a big cluster (2dcs, 10 nodes each) as it may be the root cause for slow SM repair.
@Michal-Leszczynski I think we must use SCT for that purpose. We can discuss it on today's grooming session.
Closing this PR as it appears to be stalled.