document-processing topic
parsee-core
Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.
rhubarb
A Python framework for multi-modal document understanding with Amazon Bedrock
sieves
Plug-and-play, zero-shot document AI pipelines.
graph_builder
Open-source toolkit to extract structured knowledge graphs from documents and tables — power analytics, digital twins, and AI-driven assistants.
qdrant-loader
Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligen...
nutrient-document-engine-mcp-server
A Model Context Protocol (MCP) server implementation exposes document processing capabilities through natural language, supporting both direct human interaction and AI agent tool calling.
SmartRAG
⚡ Production-ready .NET Standard 2.1 RAG library with 🤖 multi-AI provider support, 🏢 enterprise vector storage, 📄 intelligent document processing, and 🗄️ multi-database query coordination. 🌍 Cros...
pdf-to-markdown
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...
sample-document-processing-with-amazon-bedrock-data-automation
This repository contains examples for customers to get started using Amazon Bedrock Data Automation. The samples focus mainly on document processing use cases
pdf-reader-mcp
📄 Production-ready MCP server for PDF processing - 5-10x faster with parallel processing and 94%+ test coverage