scylla-manager icon indicating copy to clipboard operation
scylla-manager copied to clipboard

Increase intensity over limit

Open Michal-Leszczynski opened this issue 11 months ago • 8 comments

This is an experimental changed requested in https://github.com/scylladb/scylla-enterprise/issues/4006#issuecomment-1999914876

	// Set intensity to max of:
	// - 10 * max_ranges_in_parallel
	// - 10% of replica set ranges
	cnt = max(10*cnt, NewIntensity(len(allRanges)/10))

Michal-Leszczynski avatar Mar 18 '24 12:03 Michal-Leszczynski

Previously we were sending 1 range per request on our test env setup, now it goes to 10/17 depending on keyspace replication.

Michal-Leszczynski avatar Mar 18 '24 12:03 Michal-Leszczynski

Note that this is 10% of replica set ranges, so not 10% of all table ranges as I though the latter could be too much and I didn't get which should be taken from mentioned issue comment. That's why it is also maxed against 10 * max_ranges_in_parallel, so that it is always noticeably bigger than before.

Let me know if I should use 10% of all table ranges instead or is it good enough for testing performance improvement.

Michal-Leszczynski avatar Mar 18 '24 12:03 Michal-Leszczynski

CC: @karol-kokoszka @vladzcloudius

Michal-Leszczynski avatar Mar 18 '24 12:03 Michal-Leszczynski

@vladzcloudius do you still need the version of manager extending the intensity ? If so, I can bake it with jenkins using https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/

karol-kokoszka avatar Mar 19 '24 15:03 karol-kokoszka

I decided to go ahead and implement #3790 as a part of this PR (add --max-jobs-per-host flag). I did it because I strongly suspect that #3790 is the main reason why repair is so slow on bigger clusters. I'm not pushing for this exact implementation or even introducing it as a separate flag (it was mentioned that it could be integrated into the parallel flag). I decided to implement it like this, because it was the simplest from SM perspective. The --max-jobs-per-host works only when --parallel=0 as it doesn't make much sense to have many repair jobs on a single node while the others are idle.

So only the first commit (a879b87e) from this branch does what was requested - it removes intensity capping to max_repair_ranges_in_parallel. The rest of this PR adds --max-jobs-per-host flag. The meaning of this flag is rather obvious, but I will describe how it interacts with intensity.

When intensity is set to X, SM will try to saturate all nodes in the cluster by repairing X token ranges in parallel on each of them. SM will try to achieve this saturation with as little jobs as possible, e.g.

  1. replica set {node1, node2, node3}, intensity=10, max-jobs-per-host=10, amount of not yet repaired token ranges owned by replica set = 100 -> SM will schedule just a single repair job with 10 ranges
  2. replica set {node1, node2, node3}, intensity=10, max-jobs-per-host=10, amount of not yet repaired token ranges owned by replica set = 5 -> SM will schedule repair job with 5 ranges and will try to saturate the remaining 5 ranges on each of the nodes from replica set by scheduling other jobs overlapping with this replica set (e.g. by a job on {node2, node3, node4} with 5 ranges and a job on {node1, node5, node6} with 5 ranges as well)

So --max-jobs-per-host helps only when the average amount of ranges owned by replica set is smaller than the intensity.

The default behavior is exactly the same as without this flag. The default value for this flag is 1. This PR does not make any changes to SM DB schema, so there is no problem with rolling back from this SM build to the proper version. One can check the --max-jobs-per-host of a running repair task in its progress display (note that it will always show 1 for an already finished task). Also, this flag has been added to the repair control command, so it's possible to play with it on a running repair (although note that changes made via repair control are not persistent on repair task retry like in case of intensity/parallel).

Michal-Leszczynski avatar Apr 29 '24 12:04 Michal-Leszczynski

@karol-kokoszka PR is ready for review!

Michal-Leszczynski avatar Apr 29 '24 13:04 Michal-Leszczynski

@vladzcloudius please take a look at the longer comment above and decide if this idea fits you. The build for this branch can be obtained by running build job on scylla manager jenkins.

Michal-Leszczynski avatar Apr 30 '24 12:04 Michal-Leszczynski

@Michal-Leszczynski I'm happy to review, but let @vladzcloudius to put his opinion about the proposal. Maybe it's not worth it.

karol-kokoszka avatar Apr 30 '24 13:04 karol-kokoszka

@karol-kokoszka even if @vladzcloudius does not want to use this version, I would still be interested to see how it behaves in a testing env with a big cluster (2dcs, 10 nodes each) as it may be the root cause for slow SM repair.

Michal-Leszczynski avatar May 06 '24 07:05 Michal-Leszczynski

@Michal-Leszczynski I think we must use SCT for that purpose. We can discuss it on today's grooming session.

karol-kokoszka avatar May 06 '24 07:05 karol-kokoszka

Closing this PR as it appears to be stalled.

karol-kokoszka avatar Jul 17 '24 17:07 karol-kokoszka