gemini [Feature] Reduce number of queries sent to Oracle

Problem

Currently for each statement sent to a test cluster we send 3 queries to oracle cluster

The exact query that was sent to test cluster to save it in the exact table
The statement that is executed against test cluster saved in statement logger table
The statement that is executed against oracle cluster saved in statement logger table

This 3 for 1 kills a performance a bit since we can batch a couple of statements and one thing helps, that the oracle and test statement live in the same partition in statement logger table, this will improve performance in the long run for gemini.

Solutions

Simple

Batch the two statements into one for oracle and test cluster (cannot be one statement, cause sometimes gemini might have a bug where we send different queries to oracle and test cluster and that causes discrepency)

Bit complex but better for performance

Time based queue cleanup -> save a couple of seconds worth of queries and do everything in bigger batch -> even a half a second will work, 10k req/s will save 5k queries to statement logger and network trips.

Jul 22 '25 19:07 CodeLieutenant

questions:

how much perf gains we expect from doing simple vs complex solution?
how much perf gains we would have if we didn't store text/blobs in current form but just start:stop of our random pool? (and later decode when putting into stmt log)

Jul 23 '25 07:07 soyacz

questions:

how much perf gains we expect from doing simple vs complex solution?

In simple solution I'm expecting double the perf, cause removing one Network call (still have only one network call, batch statement). In complex, I really don't know, it will be faster but how much, without proof of concept and benchmark, hard to say.

how much perf gains we would have if we didn't store text/blobs in current form but just start:stop of our random pool? (and later decode when putting into stmt log)

Yeah that could work, but it's more work then this (a lot more rework of the current logger and we have to be carefull cause of the partition keys, selecting them etc...).

Jul 23 '25 12:07 CodeLieutenant

questions:

how much perf gains we expect from doing simple vs complex solution?

In simple solution I'm expecting double the perf, cause removing one Network call (still have only one network call, batch statement). In complex, I really don't know, it will be faster but how much, without proof of concept and benchmark, hard to say.

ok, I see, what would be the time estimates for simple and complex (including benchmarks) solutions?

how much perf gains we would have if we didn't store text/blobs in current form but just start:stop of our random pool? (and later decode when putting into stmt log)

Yeah that could work, but it's more work then this (a lot more rework of the current logger and we have to be carefull cause of the partition keys, selecting them etc...).

I see, let's drop that idea for now then.

Jul 23 '25 13:07 soyacz

Closing this issue as it was moved to Jira. Please continue the thread in https://scylladb.atlassian.net/browse/QATOOLS-105

Jan 18 '26 05:01 dani-tweig