document-processing topic

List document-processing repositories

parsee-core

24
Stars
0
Forks
Watchers

Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.

rhubarb

43
Stars
4
Forks
Watchers

A Python framework for multi-modal document understanding with Amazon Bedrock

sieves

116
Stars
8
Forks
116
Watchers

Plug-and-play, zero-shot document AI pipelines.

graph_builder

78
Stars
12
Forks
78
Watchers

Open-source toolkit to extract structured knowledge graphs from documents and tables — power analytics, digital twins, and AI-driven assistants.

qdrant-loader

20
Stars
16
Forks
20
Watchers

Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligen...

nutrient-document-engine-mcp-server

56
Stars
2
Forks
56
Watchers

A Model Context Protocol (MCP) server implementation exposes document processing capabilities through natural language, supporting both direct human interaction and AI agent tool calling.

SmartRAG

15
Stars
4
Forks
15
Watchers

⚡ Production-ready .NET Standard 2.1 RAG library with 🤖 multi-AI provider support, 🏢 enterprise vector storage, 📄 intelligent document processing, and 🗄️ multi-database query coordination. 🌍 Cros...

pdf-to-markdown

105
Stars
10
Forks
105
Watchers

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...

This repository contains examples for customers to get started using Amazon Bedrock Data Automation. The samples focus mainly on document processing use cases

pdf-reader-mcp

408
Stars
53
Forks
408
Watchers

📄 Production-ready MCP server for PDF processing - 5-10x faster with parallel processing and 94%+ test coverage