[feat] OCR based application

Open yichuan-w opened this issue 2 months ago • 2 comments

Right now, LEANN is using text embedding only. We have two other options for multimodal data:

RAGanything repo, MinerU

To RAG over vision-rich task

Nov 10 '25 02:11 yichuan-w

I want to take this issue!

Dec 07 '25 17:12 jintao-h

Sure, I guess we can write the pipeline using LEANN API, follow the format in tha app folder.

The process might be PDF->OCR results(using deepseek OCR for example)-> chunk->LEANN(vector database) and ->QA

We can make the default example several PDFs. Then we can also give some dataset like olmOCR-Bench a try to verify accuracy.

Let me know if there are further questions

Dec 07 '25 21:12 yichuan-w