LEANN
LEANN copied to clipboard
add ColQwen multimodal PDF retrieval integration
- Add ColQwenRAG class with easy-to-use CLI for multimodal PDF retrieval
- Support for both ColQwen2 and ColPali models with automatic device selection
- MPS optimization for Apple Silicon with memory-efficient loading
- Complete pipeline: PDF→images→embeddings→HNSW index→search
- Multi-vector indexing for fine-grained document matching
- Comprehensive user guide and reproduction test script
- Resolves #119: ColQwen Doc and Support Management
Features:
- python -m apps.colqwen_rag build --pdfs ./pdfs/ --index my_index
- python -m apps.colqwen_rag search my_index "query text"
- python -m apps.colqwen_rag ask my_index --interactive
- Automatic CPU fallback for memory constraints
- Robust error handling and progress tracking
Checklist
- [ ] Tests pass (
uv run pytest) - [ ] Code formatted (
ruff formatandruff check) - [ ] Pre-commit hooks pass (
pre-commit run --all-files)
The faiss submodule still seems to have some problem, we need to remember to submodule update
@ASuresh0524
hmm @yichuan-w okay sounds good will look into it
@ASuresh0524 Thanks for the PR make sure the faiss submodule is correct, and I think we create an unnecessary faiss submodule update
@ASuresh0524 Thanks for the PR make sure the faiss submodule is correct, and I think we create an unnecessary faiss submodule update
Sounds good, will fix this tomorrow