gemini experiences a performance degradation over a few hours
Tested gemini 2.2.3 in https://argus.scylladb.com/tests/scylla-cluster-tests/15866a39-06d5-40a9-8170-e942115fd362.
The write/read/delete CQL metrics shows a continues decent over time:
The total throughput decreased from ~ 13k to 8k over ~3 hours:
The are nemesis in the background, but the performance degredation seems monotonic.
This looks like an issue with retries, since there is a sleep between them. It's really strange that a lot of queries are retried multiple times, and that happens after the validation kicks in. Everything is stable until that point.
For now, lowering the retry sleep will help the performance, but it's just a temporary solution until the cause is found
IIUC below graph, it could have something to do with responses getting bigger and bigger over time:
This way validator has more to do (more rows fetched??)
IIUC below graph, it could have something to do with responses getting bigger and bigger over time:
This way validator has more to do (more rows fetched??)
That's actually possible, previously we only had one/two rows per partition, now we can have hundreds of them.
Closing this issue as it was moved to Jira. Please continue the thread in https://scylladb.atlassian.net/browse/QATOOLS-119
This way validator has more to do (more rows fetched??)