pdf-parser topic

List pdf-parser repositories

PaddleOCR

62.2k
Stars
9.2k
Forks
493
Watchers

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

pdfalyzer

222
Stars
17
Forks
Watchers

Analyze PDFs. With colors. And Yara.

nextjs-pdf-parser

46
Stars
5
Forks
Watchers

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

Scanipy

16
Stars
1
Forks
Watchers

Scanipy stands for "scan it with Python"—it's your smart Python library for scanning and parsing complex PDF files like books, reports, articles, and academic papers. Utilizing cutting-edge Deep Learn...

sciparser

39
Stars
2
Forks
Watchers

PDF parsing toolkit for preparing academic text corpus

MinerU

22.4k
Stars
1.6k
Forks
119
Watchers

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

extractous

47
Stars
2
Forks
Watchers

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

llmdocparser

206
Stars
5
Forks
Watchers

A package for parsing PDFs and analyzing their content using LLMs.