gemini gemini experiences a performance degradation over a few hours

Tested gemini 2.2.3 in https://argus.scylladb.com/tests/scylla-cluster-tests/15866a39-06d5-40a9-8170-e942115fd362.

The write/read/delete CQL metrics shows a continues decent over time:

The total throughput decreased from ~ 13k to 8k over ~3 hours:

The are nemesis in the background, but the performance degredation seems monotonic.

Jan 15 '26 15:01 yarongilor

This looks like an issue with retries, since there is a sleep between them. It's really strange that a lot of queries are retried multiple times, and that happens after the validation kicks in. Everything is stable until that point.

For now, lowering the retry sleep will help the performance, but it's just a temporary solution until the cause is found

Jan 15 '26 15:01 CodeLieutenant

IIUC below graph, it could have something to do with responses getting bigger and bigger over time:

This way validator has more to do (more rows fetched??)

Jan 15 '26 16:01 soyacz

IIUC below graph, it could have something to do with responses getting bigger and bigger over time:
This way validator has more to do (more rows fetched??)

That's actually possible, previously we only had one/two rows per partition, now we can have hundreds of them.

Jan 15 '26 16:01 CodeLieutenant

Closing this issue as it was moved to Jira. Please continue the thread in https://scylladb.atlassian.net/browse/QATOOLS-119

Jan 18 '26 05:01 dani-tweig