pdf-extractor-rag topic

List pdf-extractor-rag repositories

PaddleOCR

62.5k
Stars
9.2k
Forks
495
Watchers

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

MinerU

22.4k
Stars
1.6k
Forks
119
Watchers

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。