gemini icon indicating copy to clipboard operation
gemini copied to clipboard

gemini experiences a performance degradation over a few hours

Open yarongilor opened this issue 1 month ago • 3 comments

Tested gemini 2.2.3 in https://argus.scylladb.com/tests/scylla-cluster-tests/15866a39-06d5-40a9-8170-e942115fd362.

The write/read/delete CQL metrics shows a continues decent over time:

Image

The total throughput decreased from ~ 13k to 8k over ~3 hours:

Image

The are nemesis in the background, but the performance degredation seems monotonic.

Image

yarongilor avatar Jan 15 '26 15:01 yarongilor

This looks like an issue with retries, since there is a sleep between them. It's really strange that a lot of queries are retried multiple times, and that happens after the validation kicks in. Everything is stable until that point.

For now, lowering the retry sleep will help the performance, but it's just a temporary solution until the cause is found

CodeLieutenant avatar Jan 15 '26 15:01 CodeLieutenant

IIUC below graph, it could have something to do with responses getting bigger and bigger over time:

Image

This way validator has more to do (more rows fetched??)

soyacz avatar Jan 15 '26 16:01 soyacz

IIUC below graph, it could have something to do with responses getting bigger and bigger over time:

Image This way validator has more to do (more rows fetched??)

That's actually possible, previously we only had one/two rows per partition, now we can have hundreds of them.

CodeLieutenant avatar Jan 15 '26 16:01 CodeLieutenant

Closing this issue as it was moved to Jira. Please continue the thread in https://scylladb.atlassian.net/browse/QATOOLS-119

dani-tweig avatar Jan 18 '26 05:01 dani-tweig