HNSW index update with CDC
User description
What type of PR is this?
- [ ] API-change
- [ ] BUG
- [ ] Improvement
- [ ] Documentation
- [x] Feature
- [ ] Test and CI
- [ ] Code Refactoring
Which issue(s) this PR fixes:
issue #21835
What this PR does / why we need it:
To update the HNSW index via CDC changes.
The design doc: https://github.com/cpegeric/mo-docs/blob/hnsw_cdc/design/mo/sql/20250501-cpegeric-hnswsync.md
PR Type
Enhancement, Tests
Description
• Implement comprehensive HNSW index CDC (Change Data Capture) synchronization functionality
• Add new HnswSync struct and hnswCdcUpdate SQL function for processing CDC updates via multi-threaded operations
• Introduce hnswSyncSinker for updating HNSW indexes with CDC changes, supporting both float32 and float64 vector types
• Refactor HNSW architecture by replacing HnswBuildIndex and HnswSearchIndex with unified HnswModel structure
• Add CDC data structures (VectorIndexCdc, VectorIndexCdcEntry) and operations for insert, update, delete operations
• Implement transaction-aware SQL execution with RunTxn function and enhanced error handling
• Add comprehensive test suites covering CDC sinker functionality, synchronization operations, and model operations
• Integrate CDC task creation into HNSW index creation workflow with automatic cleanup placeholders
• Enhance array casting with dimension validation and standardize error message formats across vector operations
• Add distributed test cases for HNSW CDC synchronization scenarios including bulk loads and incremental updates
Changes walkthrough 📝
| Relevant files | |||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tests | 12 files
| ||||||||||||||||||||||||
| Feature | 6 files
| ||||||||||||||||||||||||
| Enhancement | 9 files
| ||||||||||||||||||||||||
| Code refactoring | 2 files
| ||||||||||||||||||||||||
| Configuration changes | 2 files
| ||||||||||||||||||||||||
| Miscellaneous | 3 files
|
Need help?
Type /help how to ...in the comments thread for any questions about Qodo Merge usage.Check out the documentation for more information.
PR Reviewer Guide 🔍
Here are some key observations to aid the review process:
| ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪ |
| 🧪 PR contains tests |
| 🔒 No security concerns identified |
⚡ Recommended focus areas for reviewError Handling
|
PR Code Suggestions ✨
Explore these optional code suggestions:
| Category | Suggestion | Impact |
| Possible issue |
Add bounds checking for sliceThe code accesses pkg/cdc/hnsw_sinker.go [209-255]
Suggestion importance[1-10]: 7__ Why: The suggestion correctly identifies a potential panic from an out-of-bounds slice access on | Medium |
Add nil check validationThe function increments pkg/vectorindex/hnsw/sync.go [545-569]
Suggestion importance[1-10]: 2__ Why: The suggestion adds a defensive | Low | |
| General |
Remove duplicate Load callThe pkg/vectorindex/hnsw/model.go [452-458]
Suggestion importance[1-10]: 6__ Why: The suggestion correctly identifies that | Low |
Fix misleading error messageThe error message "missing lock service" is misleading since the code is pkg/vectorindex/sqlexec/sqlexec.go [81-83]
Suggestion importance[1-10]: 6__ Why: The suggestion correctly points out that the panic message "missing lock service" is misleading, as the code is actually checking for an | Low | |
| ||