LEANN icon indicating copy to clipboard operation
LEANN copied to clipboard

feat: Add CLI support for updating existing indices

Open mark-hingston opened this issue 1 month ago • 6 comments

What does this PR do?

Summary

  • Adds a new leann update CLI command that allows users to incrementally add new documents to existing HNSW indices without rebuilding from scratch
  • Includes validation to ensure only non-compact HNSW indices can be updated, with clear error messages guiding users when prerequisites aren't met

Changes

  • CLI Command: New update subcommand with support for multiple document paths, chunking options, file type filters, and AST chunking
  • Validation: Checks that target index exists, uses HNSW backend, and is non-compact before allowing updates
  • Metadata Preservation: Reads existing index configuration (embedding model, graph parameters, chunking settings) to ensure consistency
  • Documentation: Updated README with complete update command reference and examples
  • Tests: Added comprehensive test suite (test_cli_update.py) covering argument parsing, chunking options, file filters, and default values

Usage Example

Add new documents to existing index

leann update my-docs --docs ./new-documents

Add with custom chunking

leann update my-code --docs ./new-src --file-types .py,.js

Notes

  • Only works with HNSW indices built with --no-compact flag
  • Preserves original index configuration for consistency
  • Provides helpful error messages when update isn't possible

Related Issues

Fixes #

Checklist

  • [x] Tests pass (uv run pytest)
  • [x] Code formatted (ruff format and ruff check)
  • [x] Pre-commit hooks pass (pre-commit run --all-files)

mark-hingston avatar Dec 07 '25 17:12 mark-hingston

Thanks, it is a great PR we will review it later. BTW, is that for recompute or not? I guess it uses our update/add API, which we have not tested in a production-wide range

yichuan-w avatar Dec 10 '25 21:12 yichuan-w

I guess there might be some risk in merging this, because this feature is still in meta-version. Have you tried on a real workload, and I am not sure if that is actually work?

yichuan-w avatar Dec 10 '25 21:12 yichuan-w

I guess there might be some risk in merging this, because this feature is still in meta-version. Have you tried on a real workload, and I am not sure if that is actually work?

Thanks for the review. Yes, the update command respects the original index's is_recompute setting - it reads it from metadata and passes it to the builder.

I haven't tested a real/prod workload.

mark-hingston avatar Dec 11 '25 19:12 mark-hingston

@mark-hingston Would you be able to test it and let us know how it works, would be good to see the workflow run

ASuresh0524 avatar Dec 13 '25 02:12 ASuresh0524

@mark-hingston please let us know any updates on the testing! would need to resolve some merge conflicts if all works well on this

ASuresh0524 avatar Jan 10 '26 08:01 ASuresh0524