LEANN
LEANN copied to clipboard
Search with `recompute` second level latency for code RAG
What happened?
Title: Search with recompute takes ~15s even after warmup
Environment
- macOS (darwin 24.6.0)
- Python 3.10.18
- LEANN branch:
feature/colqwen-integration
Steps to Reproduce
-
Create a tiny code repo:
mkdir -p /tmp/quick_test_code cat <<'PY' > /tmp/quick_test_code/test.py def hello(): return "world" class Test: pass PY -
Build an index:
leann build quick-test-index \ --docs /tmp/quick_test_code \ --use-ast-chunking \ --embedding-model facebook/contriever \ --embedding-mode sentence-transformers -
Run searches (measure wall time):
time leann search quick-test-index "hello" --top-k 3 time leann search quick-test-index "Test" --top-k 3 time leann search quick-test-index "function" --top-k 3
Observed
- Build completes (~15 s).
- Each search takes 13–19 s, even after multiple runs.
- Warm vs cold only differs by ~1.2 s, so most time is spent recomputing embeddings per query.
Expected
- Searches with
recompute=Truemay be slower, but not 13–19 s for a trivial index. - Would like to understand if there’s a way to avoid full model reload or make on-the-fly query embedding faster.
How to reproduce
Follow steps from above.
Error message
LEANN Version
0.1.0
Operating System
macOS