feat: Add CLI support for updating existing indices
What does this PR do?
Summary
- Adds a new
leann updateCLI command that allows users to incrementally add new documents to existing HNSW indices without rebuilding from scratch - Includes validation to ensure only non-compact HNSW indices can be updated, with clear error messages guiding users when prerequisites aren't met
Changes
-
CLI Command: New
updatesubcommand with support for multiple document paths, chunking options, file type filters, and AST chunking - Validation: Checks that target index exists, uses HNSW backend, and is non-compact before allowing updates
- Metadata Preservation: Reads existing index configuration (embedding model, graph parameters, chunking settings) to ensure consistency
- Documentation: Updated README with complete update command reference and examples
-
Tests: Added comprehensive test suite (
test_cli_update.py) covering argument parsing, chunking options, file filters, and default values
Usage Example
Add new documents to existing index
leann update my-docs --docs ./new-documents
Add with custom chunking
leann update my-code --docs ./new-src --file-types .py,.js
Notes
- Only works with HNSW indices built with --no-compact flag
- Preserves original index configuration for consistency
- Provides helpful error messages when update isn't possible
Related Issues
Fixes #
Checklist
- [x] Tests pass (
uv run pytest) - [x] Code formatted (
ruff formatandruff check) - [x] Pre-commit hooks pass (
pre-commit run --all-files)
Thanks, it is a great PR we will review it later. BTW, is that for recompute or not? I guess it uses our update/add API, which we have not tested in a production-wide range
I guess there might be some risk in merging this, because this feature is still in meta-version. Have you tried on a real workload, and I am not sure if that is actually work?
I guess there might be some risk in merging this, because this feature is still in meta-version. Have you tried on a real workload, and I am not sure if that is actually work?
Thanks for the review. Yes, the update command respects the original index's is_recompute setting - it reads it from metadata and passes it to the builder.
I haven't tested a real/prod workload.
@mark-hingston Would you be able to test it and let us know how it works, would be good to see the workflow run
@mark-hingston please let us know any updates on the testing! would need to resolve some merge conflicts if all works well on this