[Feature] Concurrency Control
Current Situation
Gemini has a flags --concurrency which controls, as the name implies, the number of workers that will be spawned to do the work. This is the number for READERS and WRITERS (aka mutations and validations).
The problem
The problem arises when reads become slower then writes, which they do, in 5 to 1 ration (for every read, we can do 5 writes). This ration means, that we will be dropping some values that should be validated, but never will, cause writes are really fast and reads are not (this is due to many reason, some of the things is in Scylla, others in gemini - we have to compare the results from both Oracle and Test clusters). Since we are dropping those values (as they cannot be put in already filled validation buffer) we never validate those rows in both places.
Solution
- Add two flags
--mutation-threadsand--validation-threadsand remove--concurrency, to allow better control - BREAKING CHANGE, BUT EASY TO IMPLEMENT - Keep
--concurrencybut parse it a bit different, ratio parsing (syntax like - M(10)/V(30)) The second solution is much better as it allows us to keep 100% of backwards compatibility and add additional syntax for better control, also to have non-fixed N threads (this feature can be added in the future)
I'm not sure I understand the problem here:
- that we don't read everything that was written ? how the suggest fix helps to guarantee this ?
- that we don't control the rate of each type of operation ? where the "demend" of having such control is coming ? to improve throughput of what Gemini is doing ?
we'll need to explain a bit more, what is the problem, it's not completely clear
I'm not sure I understand the problem here:
- that we don't read everything that was written ? how the suggest fix helps to guarantee this ?
- that we don't control the rate of each type of operation ? where the "demend" of having such control is coming ? to improve throughput of what Gemini is doing ?
we'll need to explain a bit more, what is the problem, it's not completely clear
When a mutation is done, the values from that are pushed to oldValues channel, which is used later by validation logic, when that channel is full, the new value is just dropped, since mutations are 5x faster then validation, filling up that channel is quick, and values are being dropped constantly, most of them get dropped at warmup, as validation is not running during the warmup, and at that point oldValues channel is full constantly. When we proceed to mutation and validation, we start validating and mutating at the same time with --concurrency amount of threads, since the channel for validation is constantly full, we dropped the values that should be used for validation since mutation is so fast that the validation cannot pull values fast enough. So for better control how much we insert and validate, we can tune the concurrency, since we will achieve the same throughput if we lower the number of threads for mutation and increase the number of threads for validation and at the same time we will not drop large amount of values
I'm not sure I understand the problem here:
- that we don't read everything that was written ? how the suggest fix helps to guarantee this ?
- that we don't control the rate of each type of operation ? where the "demend" of having such control is coming ? to improve throughput of what Gemini is doing ?
we'll need to explain a bit more, what is the problem, it's not completely clear
When a mutation is done, the values from that are pushed to
oldValueschannel, which is used later by validation logic, when that channel is full, the new value is just dropped, since mutations are 5x faster then validation, filling up that channel is quick, and values are being dropped constantly, most of them get dropped at warmup, as validation is not running during the warmup, and at that pointoldValueschannel is full constantly. When we proceed tomutationandvalidation, we start validating and mutating at the same time with--concurrencyamount of threads, since the channel for validation is constantly full, we dropped the values that should be used for validation since mutation is so fast that the validation cannot pull values fast enough. So for better control how much we insert and validate, we can tune the concurrency, since we will achieve the same throughput if we lower the number of threads for mutation and increase the number of threads for validation and at the same time we will not drop large amount of values
again, you are suggesting split the rate control. but not stating what the goal/purpose for this change is
- buffers full - what's the problem with it ?
- what do we gain from having separate concurrency control ?
- buffers full - what's the problem with it ?
When buffer is full we start dropping the new values, so we never validate them, they are dropped, and buffer is fixed size, if it was not we would grow memory exponentially
- what do we gain from having separate concurrency control ?
We can control the validation and mutation rate, this basically means, more reads threds which will pull data from that buffer faster and no drops (or at least there will be negligible values dropped).
e.g 10 Write threads to 40 Reads threds -> read buffer will stay empty or with some small amount of values that are filled from Writes. Throughput will be the same 10k req/s but it will be split into 5k writes and 5k reads, instead of having concurrency=50 with 50 write and 50 read threads, which gets the same throughput on 4core machine (10-15k req/s) but we do not have time to validate everything
My thoughts:
- I don't like setting concurrency - usually people don't know what to set. I think we should do it automatically (if possible) or set some high value and use 'throttle' to set desired throughput (especially write throughput).
- You just uncovered big limitation of Gemini - if we need to keep in memory everything we might want to read, we're limited to only 'most recent' data. Like we cannot read thing that were written in the beginning of a test. We should think about solving this one.
- I don't see it's mandatory to read everything that we wrote - especially when we have limitations what we can query. We should think of keeping also some 'old' data to read later.
My thoughts:
- I don't like setting concurrency - usually people don't know what to set. I think we should do it automatically (if possible) or set some high value and use 'throttle' to set desired throughput (especially write throughput).
I don't like it either, but it's a bit tricky to control the concurrency for it, as this is mostly IO bound and not CPU, so when we crank threads to more we do get more throughput, we can have some algorithm to make go up and down depending on CPU usage or something like that.
- You just uncovered big limitation of Gemini - if we need to keep in memory everything we might want to read, we're limited to only 'most recent' data. Like we cannot read thing that were written in the beginning of a test. We should think about solving this one.
We cannot keep everything in memory, that's not possible, even just keeping the hashes, with the rate of writes and reads currently, memory will grow and eventually OOM.
Most Recent
Well thats not the case. it's a queue (channel), we read the oldest values first and drop the newest. To solve it right now, quick and dirty solution is just to implement the concurrency control i suggested, and then we can move forward to more sophisticated solutions.
- I don't see it's mandatory to read everything that we wrote - especially when we have limitations what we can query. We should think of keeping also some 'old' data to read later.
Yeah, we don't have to read everything but, when you have 30k/s dropped values, that's a lot of values not being read.
Also one quick solution to do (and this avoids many pitfalls), during the warmup faze, we should not push to oldValues channel and just do the inserts, and when mutation and validation loop start, just then we push the newly inserted/updated/deleted values to the channel. Warmup is just inserts and it's really fast, fills all the queues in a couple of minutes (if not in a couple of seconds), and then we for the most part we validate them and we end up in the situation where we have a miss-match for whatever reason
@CodeLieutenant @soyacz is this currently being work on ?
if yes, please update it's status and field accordingly
it's planned for future, in 'todo' and in 'next' milestone (meaning in future, no specific release). What else to set?
I thought it's currently in progress
Closing this issue as it was moved to Jira. Please continue the thread in https://scylladb.atlassian.net/browse/QATOOLS-104