Refactor: pip install --upgrade DocAgent
Why are these changes needed?
Identified Enterprise-readiness issues:
- Runtime Performance: Large document ingestion happens synchronously during user interactions, creating poor UX.
- Resource Waste: New vector storage processes are created for every document, even if already processed.
- Limited Storage Support: Only local file paths are supported, missing cloud storage capabilities.
- Single RAG Backend: Limited to ChromaDB without enterprise alternatives like Weaviate or graph-based approaches.
the base refactor solves the 1st 2 problems defined above , runtime Performance and resource waste via decoupling data ingestion from the parent architecture.
# Setup
llm_config = LLMConfig(model="o3-mini", api_type="openai", api_key=os.getenv("OPENAI_API_KEY"))
# Initialize components
query_engine = VectorChromaQueryEngine(collection_name="new_collection")
ingestion_service = DocumentIngestionService(query_engine=query_engine)
doc_agent = DocAgent(llm_config=llm_config, query_engine=query_engine)
# Test document
doc_path = "test/agentchat/contrib/graph_rag/Toast_financial_report.pdf"
if Path(doc_path).exists():
# Step 1: Ingest document
print("Step 1: Ingesting document...")
result = ingestion_service.ingest_document(doc_path)
# print(f"Ingestion: {result}")
# Step 2: Query document
print("\nStep 2: Querying document...")
response = doc_agent.run(message="What is the fiscal year 2024 financial summary? ", max_turns=1)
example output:
DocAgent (to DocAgent):
What is the fiscal year 2024 financial summary?
--------------------------------------------------------------------------------
_User (to chat_manager):
What is the fiscal year 2024 financial summary?
--------------------------------------------------------------------------------
Next speaker: QueryAgent
>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):
***** Suggested tool call (call_VrLT1PH5lY4fdVLLKzgwoEZ9): execute_rag_query *****
Arguments:
{}
**********************************************************************************
--------------------------------------------------------------------------------
Next speaker: _Group_Tool_Executor
>>>>>>>> EXECUTING FUNCTION execute_rag_query...
Call ID: call_VrLT1PH5lY4fdVLLKzgwoEZ9
Input arguments: {}
>>>>>>>> EXECUTED FUNCTION execute_rag_query...
Call ID: call_VrLT1PH5lY4fdVLLKzgwoEZ9
Input arguments: {}
Output:
{'content': "The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million."}
_Group_Tool_Executor (to chat_manager):
***** Response from calling tool (call_VrLT1PH5lY4fdVLLKzgwoEZ9) *****
{'content': "The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million."}
**********************************************************************
--------------------------------------------------------------------------------
Next speaker: QueryAgent
>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):
The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, is as follows:
- Total Assets: $2,227 million
- Total Liabilities: $807 million
- Stockholders' Equity: $1,420 million
- Current Assets: $1,802 million
- Cash and Cash Equivalents: $761 million
- Accumulated Deficit: $1,636 million
- Additional Paid-in Capital: $3,053 million
- Total Current Liabilities: $748 million.
--------------------------------------------------------------------------------
Next speaker: SummaryAgent
>>>>>>>> USING AUTO REPLY...
SummaryAgent (to chat_manager):
Ingestions:
1. The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million.
Queries:
1. What is the fiscal year 2024 financial summary?
Answer: The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million, total liabilities of $807 million, stockholders' equity of $1,420 million, current assets of $1,802 million, cash and cash equivalents of $761 million, an accumulated deficit of $1,636 million, additional paid-in capital of $3,053 million, and total current liabilities of $748 million.
--------------------------------------------------------------------------------
>>>>>>>> TERMINATING RUN (4f10222b-717c-4c1c-bccf-c83aa3666058): No next speaker selected
DocAgent (to DocAgent):
Ingestions:
1. The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million.
Queries:
1. What is the fiscal year 2024 financial summary?
Answer: The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million, total liabilities of $807 million, stockholders' equity of $1,420 million, current assets of $1,802 million, cash and cash equivalents of $761 million, an accumulated deficit of $1,636 million, additional paid-in capital of $3,053 million, and total current liabilities of $748 million.
--------------------------------------------------------------------------------
>>>>>>>> TERMINATING RUN (d35f8d2a-e639-4d99-bfd5-0ffb1c3bb7f1): Maximum turns (1) reached
Answer: Ingestions:
1. The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million.
Queries:
1. What is the fiscal year 2024 financial summary?
Answer: The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million, total liabilities of $807 million, stockholders' equity of $1,420 million, current assets of $1,802 million, cash and cash equivalents of $761 million, an accumulated deficit of $1,636 million, additional paid-in capital of $3,053 million, and total current liabilities of $748 million.
Related issue number
closes #2078
Checks
- [ ] I've included any doc changes needed for https://docs.ag2.ai/. See https://docs.ag2.ai/latest/docs/contributor-guide/documentation/ to build and test documentation locally.
- [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [ ] I've made sure all auto checks have passed.
π Documentation Analysis
All docs are up to date! π
β Latest commit analyzed: 34358b4edf8076521045dc024b5055bd0f899999 | Powered by Joggr
DocAgent Refactor
Current state of DocAgent
The existing DocAgent follows a swarm architecture with multiple specialized agents (Triage, Task Manager, Parser, Data Ingestion, Query, Error, and Summary agents). While this design provides clear separation of concerns, it introduces several production-readiness issues:
Runtime Performance: Large document ingestion happens synchronously during user interactions, creating poor UX
Resource Waste: New vector storage processes are created for every document, even if already processed
Limited Storage Support: Only local file paths are supported, missing cloud storage capabilities
Single RAG Backend: Limited to ChromaDB without enterprise alternatives like Weaviate or graph-based approaches
the new design will feature 4 layers:
- Query Layer: Handles user interactions and RAG queries
- Ingestion Layer: Processes documents asynchronously via events
- Storage Layer: Abstracts storage backends (local, cloud, blob)
- RAG Layer: Supports multiple RAG strategies (vector, structured, graph)
### How do we solve this problem?
- Event-Driven Ingestion
Instead of processing documents during runtime, the new architecture will use an event-driven approach, where documents will be ingested based on triggered events like button clicks, file uploads, etc.
# Before: Synchronous processing during query
user_query = "What's in this PDF?"
# Agent processes PDF β chunks β vectorizes β stores β queries (slow!)
# After: Event-driven ingestion
ingestion_service.ingest_document("large_report.pdf") # Async event
# Later...
user_query = "What's in this PDF?"
# Agent queries pre-processed data (fast!)
- Decoupled Storage The storage layer will be separated from the query logic, this will allow users to configure cloud storage without changing the core agent logic.
@dataclass
class StorageConfig:
storage_type: str = "local" # "local", "s3", "azure", "gcs", "minio"
base_path: Path = field(default_factory=lambda: Path("./storage"))
bucket_name: str | None = None
credentials: dict[str, Any] | None = None
- Multiple RAG Backends The new architecture supports three RAG strategies through a unified interface, add can be configured for any backend
@dataclass
class RAGConfig:
rag_type: str = "vector" # "vector", "structured", "graph"
backend: str = "chromadb" # "chromadb", "weaviate", "neo4j", "inmemory"
collection_name: str | None = None
embedding_model: str = "all-MiniLM-L6-v2"
- Configuration & Interfaces Unified Configuration The DocAgentConfig consolidates all settings in one place:
config = DocAgentConfig(
rag=RAGConfig(
rag_type="vector",
backend="weaviate",
embedding_model="all-MiniLM-L6-v2"
),
storage=StorageConfig(
storage_type="s3",
bucket_name="my-docs-bucket"
),
processing=ProcessingConfig(
chunk_size=1024,
max_file_size=500 * 1024 * 1024 # 500MB
)
)
example usage
from autogen.agents.experimental.document_agent import DocAgent2, DocumentIngestionService
from autogen.agents.experimental.document_agent.core import DocAgentConfig
# Configure for production use
config = DocAgentConfig(
rag=RAGConfig(backend="weaviate", rag_type="vector"),
storage=StorageConfig(storage_type="s3", bucket_name="company-docs")
)
# Initialize query engine (supports multiple backends)
query_engine = WeaviateQueryEngine(config.rag)
# Create ingestion service (handles document processing)
ingestion_service = DocumentIngestionService(query_engine, config)
# Process documents asynchronously (event-driven)
ingestion_service.ingest_document("large_manual.pdf") # Non-blocking
# Create query agent (fast, no document processing)
doc_agent = DocAgent2(
query_engine=query_engine,
config=config
)
# Query pre-processed documents
response = doc_agent.query("What are the safety procedures?")
todos:
- [x] initial refactoring plan:
- Extract base interfaces from existing query engines
- Move document processing to separate ingestion module
- Simplify DocAgent to be query-only
- Create separate ingestion service using existing code
rough FS structure
document_agent/
βββ core/
β βββ __init__.py
β βββ base_interfaces.py # Extract interfaces from existing code
β βββ config.py # Configuration from existing code
βββ ingestion/
β βββ __init__.py
β βββ document_processor.py # Move from parser_utils.py + docling_doc_ingest_agent.py
β βββ chunking_strategies.py # Extract from existing parsing logic
βββ storage/
β βββ __init__.py
β βββ local_storage.py # Move from document_utils.py
βββ rag/
β βββ __init__.py
β βββ base_rag.py # Extract from chroma_query_engine.py + inmemory_query_engine.py
β βββ vector_rag.py # Move chroma_query_engine.py
βββ agents/
βββ __init__.py
βββ doc_agent.py # Simplified version of document_agent.py
βββ ingestion_agent.py # Move from docling_doc_ingest_agent.py
-
[ ] step 2: We will add a Database Storage Layer add blob storage support (S3, Azure, GCS), Implement MinIO/DynamoDB bucket support and Creating storage abstraction layer
-
[ ] - [ ] step 4: Add structured RAG support: add postgresDBqueryengine support, implement structured query capabilities, create structured RAG strategy
-
[ ] step 5: We will add Graph RAG Backend , event based Knowledge Graph Creation support. add support for cypher queries support for data retrieval.
-
[ ] step 6: add unit test module for new DocAgent
The refactored DocAgent transforms from a research prototype into a production/enterprise-ready Ag2 feature with following benefits:
- Performance: Query responses are instant since documents are pre-processed.
- Scalability: Cloud storage support handles enterprise document volumes.
- Flexibility: Multiple RAG backends for different use cases.
- Maintainability: Clear separation of concerns and unified configuration.
- Production Ready: Event-driven architecture supports real-world orchestrations.
@marklysze can you help review? Thank you!
Codecov Report
:x: Patch coverage is 71.70418% with 88 lines in your changes missing coverage. Please review.
| Files with missing lines | Coverage Ξ | |
|---|---|---|
| ...imental/document_agent/agents/ingestion_service.py | 100.00% <100.00%> (ΓΈ) |
|
| .../agents/experimental/document_agent/core/config.py | 100.00% <100.00%> (ΓΈ) |
|
| ...xperimental/document_agent/core/base_interfaces.py | 82.60% <82.60%> (ΓΈ) |
|
| ...ts/experimental/document_agent/agents/doc_agent.py | 74.48% <74.48%> (ΓΈ) |
|
| ...tal/document_agent/ingestion/document_processor.py | 28.57% <28.57%> (ΓΈ) |
... and 41 files with indirect coverage changes
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.