LEANN
LEANN copied to clipboard
[feat] OCR based application
What problem does this solve?
Right now, LEANN is using text embedding only. We have two other options for multimodal data:
- Use DeepSeek OCR or MinerU to process all into text space
- maintain both image vectors and text vectors separately
Proposed solution
RAGanything repo, MinerU
Example usage
To RAG over vision-rich task
I want to take this issue!
Sure, I guess we can write the pipeline using LEANN API, follow the format in tha app folder.
The process might be PDF->OCR results(using deepseek OCR for example)-> chunk->LEANN(vector database) and ->QA
We can make the default example several PDFs. Then we can also give some dataset like olmOCR-Bench a try to verify accuracy.
Let me know if there are further questions