pdf-parser topic
PaddleOCR
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
pdfalyzer
Analyze PDFs. With colors. And Yara.
nextjs-pdf-parser
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
Scanipy
Scanipy stands for "scan it with Python"—it's your smart Python library for scanning and parsing complex PDF files like books, reports, articles, and academic papers. Utilizing cutting-edge Deep Learn...
sciparser
PDF parsing toolkit for preparing academic text corpus
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
llmdocparser
A package for parsing PDFs and analyzing their content using LLMs.