cassandra icon indicating copy to clipboard operation
cassandra copied to clipboard

CNDB-13952: Handle Chronicle Map entry overflow in vector index compaction

Open michaeljmarshall opened this issue 8 months ago • 4 comments

What is the issue

Fixes: https://github.com/riptano/cndb/issues/13952

What does this PR fix and why was it fixed

We were hitting the entry size limit in some cases (where there were an excessive number of duplicates). This code handles that exception by attempting to reduce the size required for storing those duplicates by writing the dupes as varints instead of plain integers.

Note that most cases have only a handful of duplicated vectors per graph, so we do not optimize for this large number of dupe case. Further, chronicle map allocates a minimum chunk for an entry, and we are often under that size, so there is no reason to only write the ints as varints.

michaeljmarshall avatar May 09 '25 21:05 michaeljmarshall

Checklist before you submit for review

  • [ ] Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • [ ] Use NoSpamLogger for log lines that may appear frequently in the logs
  • [ ] Verify test results on Butler
  • [ ] Test coverage for new/modified code is > 80%
  • [ ] Proper code formatting
  • [ ] Proper title for each commit staring with the project-issue number, like CNDB-1234
  • [ ] Each commit has a meaningful description
  • [ ] Each commit is not very long and contains related changes
  • [ ] Renames, moves and reformatting are in distinct commits
  • [ ] All new files should contain the DataStax copyright header instead of the Apache License one

github-actions[bot] avatar May 09 '25 21:05 github-actions[bot]

@eolivelli - this is ready for another review, please take a look

michaeljmarshall avatar Jun 03 '25 19:06 michaeljmarshall

:heavy_check_mark: Build ds-cassandra-pr-gate/PR-1731 approved by Butler


Approved by Butler See build details here

cassci-bot avatar Jun 06 '25 21:06 cassci-bot