LEANN icon indicating copy to clipboard operation
LEANN copied to clipboard

add ColQwen multimodal PDF retrieval integration

Open ASuresh0524 opened this issue 2 months ago • 3 comments

  • Add ColQwenRAG class with easy-to-use CLI for multimodal PDF retrieval
  • Support for both ColQwen2 and ColPali models with automatic device selection
  • MPS optimization for Apple Silicon with memory-efficient loading
  • Complete pipeline: PDF→images→embeddings→HNSW index→search
  • Multi-vector indexing for fine-grained document matching
  • Comprehensive user guide and reproduction test script
  • Resolves #119: ColQwen Doc and Support Management

Features:

  • python -m apps.colqwen_rag build --pdfs ./pdfs/ --index my_index
  • python -m apps.colqwen_rag search my_index "query text"
  • python -m apps.colqwen_rag ask my_index --interactive
  • Automatic CPU fallback for memory constraints
  • Robust error handling and progress tracking

Checklist

  • [ ] Tests pass (uv run pytest)
  • [ ] Code formatted (ruff format and ruff check)
  • [ ] Pre-commit hooks pass (pre-commit run --all-files)

ASuresh0524 avatar Nov 11 '25 00:11 ASuresh0524

The faiss submodule still seems to have some problem, we need to remember to submodule update

yichuan-w avatar Nov 14 '25 23:11 yichuan-w

@ASuresh0524

yichuan-w avatar Nov 14 '25 23:11 yichuan-w

hmm @yichuan-w okay sounds good will look into it

ASuresh0524 avatar Nov 15 '25 00:11 ASuresh0524

@ASuresh0524 Thanks for the PR make sure the faiss submodule is correct, and I think we create an unnecessary faiss submodule update

yichuan-w avatar Dec 03 '25 09:12 yichuan-w

@ASuresh0524 Thanks for the PR make sure the faiss submodule is correct, and I think we create an unnecessary faiss submodule update

Sounds good, will fix this tomorrow

ASuresh0524 avatar Dec 03 '25 09:12 ASuresh0524