LEANN icon indicating copy to clipboard operation
LEANN copied to clipboard

[feat] OCR based application

Open yichuan-w opened this issue 2 months ago • 2 comments

What problem does this solve?

Right now, LEANN is using text embedding only. We have two other options for multimodal data:

  1. Use DeepSeek OCR or MinerU to process all into text space
  2. maintain both image vectors and text vectors separately

Proposed solution

RAGanything repo, MinerU

Example usage

To RAG over vision-rich task

yichuan-w avatar Nov 10 '25 02:11 yichuan-w

I want to take this issue!

jintao-h avatar Dec 07 '25 17:12 jintao-h

Sure, I guess we can write the pipeline using LEANN API, follow the format in tha app folder.

The process might be PDF->OCR results(using deepseek OCR for example)-> chunk->LEANN(vector database) and ->QA

We can make the default example several PDFs. Then we can also give some dataset like olmOCR-Bench a try to verify accuracy.

Let me know if there are further questions

yichuan-w avatar Dec 07 '25 21:12 yichuan-w