cassandra icon indicating copy to clipboard operation
cassandra copied to clipboard

Scale Plan constants by disk cache hit rate where appropriate

Open jbellis opened this issue 1 year ago • 1 comments

Note: in commit 3, the semantics of maxBruteForceRows are changed. The new semantics are correct, in other words expectedNodesVisited already includes the degree factor. I added a comment to expectedNodesVisited to clarify.

jbellis avatar Aug 01 '24 22:08 jbellis

Ready for final review.

One hiccup that I'm not sure how to solve: AnnIndexScan overestimates its cost:

            double scanCost = annSearchCost(estimatedNodes, expectedKeysInt);

This should be

            double scanCost = annSearchCost(estimatedNodes, limit);

but I don't know how to get the limit from the plan in this part of the code.

This can cause misplanning because we're multiplying ANN_SCORED_KEY_COST (a high number) by expectedKeysInt instead of by limit. (In the case of LIMIT 1 and selectivity 0.05, the former is 20x higher than the latter.)

jbellis avatar Aug 21 '24 19:08 jbellis

I think we have a similar issue in estimateAnnSortCost, we just happen to be lucky there and expectedKeysInt=limit.

jbellis avatar Aug 21 '24 19:08 jbellis

The performance is much better now. It switches over at selectivity between 0.02 and 0.03 now (at over 200 req/s) which looks perfect to me!

pkolaczk avatar Aug 23 '24 12:08 pkolaczk

okay, the ann planning is working now with everything i've thrown at it, and I've merged from main. will let CI run next.

one last issue: PlanTest.intersectionWithEmpty is failing because of the reduced SAI_KEY_COST. It looks like it was passing before kind of by accident. The issue is that a2.iterCost is 0.8 so the total cost ends up being 0.6 higher than expected. I am not sure if this is a bug in the test or in the intersection cost logic. Can you take a look?

jbellis avatar Aug 23 '24 17:08 jbellis

test failures are addressed.

jbellis avatar Aug 26 '24 21:08 jbellis