document-parsing topic

List document-parsing repositories

PaddleOCR

62.6k
Stars
9.2k
Forks
496
Watchers

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

unstructured

8.6k
Stars
702
Forks
43
Watchers

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

edenai-apis

374
Stars
53
Forks
Watchers

Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines

papercast

32
Stars
1
Forks
Watchers

A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines...

community

19
Stars
6
Forks
Watchers

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

docling

22.8k
Stars
1.3k
Forks
94
Watchers

Get your documents ready for gen AI