ag2 icon indicating copy to clipboard operation
ag2 copied to clipboard

Refactor: pip install --upgrade DocAgent

Open priyansh4320 opened this issue 3 months ago β€’ 4 comments

Why are these changes needed?

Identified Enterprise-readiness issues:

  • Runtime Performance: Large document ingestion happens synchronously during user interactions, creating poor UX.
  • Resource Waste: New vector storage processes are created for every document, even if already processed.
  • Limited Storage Support: Only local file paths are supported, missing cloud storage capabilities.
  • Single RAG Backend: Limited to ChromaDB without enterprise alternatives like Weaviate or graph-based approaches.

the base refactor solves the 1st 2 problems defined above , runtime Performance and resource waste via decoupling data ingestion from the parent architecture.

   # Setup
    llm_config = LLMConfig(model="o3-mini", api_type="openai", api_key=os.getenv("OPENAI_API_KEY"))

    # Initialize components
    query_engine = VectorChromaQueryEngine(collection_name="new_collection")
    ingestion_service = DocumentIngestionService(query_engine=query_engine)
    doc_agent = DocAgent(llm_config=llm_config, query_engine=query_engine)

    # Test document
    doc_path = "test/agentchat/contrib/graph_rag/Toast_financial_report.pdf"

    if Path(doc_path).exists():
        # Step 1: Ingest document
        print("Step 1: Ingesting document...")
        result = ingestion_service.ingest_document(doc_path)
        # print(f"Ingestion: {result}")

        # Step 2: Query document
        print("\nStep 2: Querying document...")
        response = doc_agent.run(message="What is the fiscal year 2024 financial summary? ", max_turns=1)

example output:

DocAgent (to DocAgent):

What is the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------
_User (to chat_manager):

What is the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------

Next speaker: QueryAgent


>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):

***** Suggested tool call (call_VrLT1PH5lY4fdVLLKzgwoEZ9): execute_rag_query *****
Arguments: 
{}
**********************************************************************************

--------------------------------------------------------------------------------

Next speaker: _Group_Tool_Executor


>>>>>>>> EXECUTING FUNCTION execute_rag_query...
Call ID: call_VrLT1PH5lY4fdVLLKzgwoEZ9
Input arguments: {}

>>>>>>>> EXECUTED FUNCTION execute_rag_query...
Call ID: call_VrLT1PH5lY4fdVLLKzgwoEZ9
Input arguments: {}
Output:
{'content': "The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million."}
_Group_Tool_Executor (to chat_manager):

***** Response from calling tool (call_VrLT1PH5lY4fdVLLKzgwoEZ9) *****
{'content': "The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million."}
**********************************************************************

--------------------------------------------------------------------------------

Next speaker: QueryAgent


>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):

The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, is as follows:

- Total Assets: $2,227 million
- Total Liabilities: $807 million
- Stockholders' Equity: $1,420 million
- Current Assets: $1,802 million
- Cash and Cash Equivalents: $761 million
- Accumulated Deficit: $1,636 million
- Additional Paid-in Capital: $3,053 million
- Total Current Liabilities: $748 million.

--------------------------------------------------------------------------------

Next speaker: SummaryAgent


>>>>>>>> USING AUTO REPLY...
SummaryAgent (to chat_manager):

Ingestions:
1. The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million.

Queries:
1. What is the fiscal year 2024 financial summary?
Answer: The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million, total liabilities of $807 million, stockholders' equity of $1,420 million, current assets of $1,802 million, cash and cash equivalents of $761 million, an accumulated deficit of $1,636 million, additional paid-in capital of $3,053 million, and total current liabilities of $748 million.

--------------------------------------------------------------------------------

>>>>>>>> TERMINATING RUN (4f10222b-717c-4c1c-bccf-c83aa3666058): No next speaker selected
DocAgent (to DocAgent):

Ingestions:
1. The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million.

Queries:
1. What is the fiscal year 2024 financial summary?
Answer: The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million, total liabilities of $807 million, stockholders' equity of $1,420 million, current assets of $1,802 million, cash and cash equivalents of $761 million, an accumulated deficit of $1,636 million, additional paid-in capital of $3,053 million, and total current liabilities of $748 million.

--------------------------------------------------------------------------------

>>>>>>>> TERMINATING RUN (d35f8d2a-e639-4d99-bfd5-0ffb1c3bb7f1): Maximum turns (1) reached
Answer: Ingestions:
1. The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million and total liabilities of $807 million. The stockholders' equity stands at $1,420 million. Current assets amount to $1,802 million, with cash and cash equivalents at $761 million. The company has an accumulated deficit of $1,636 million and additional paid-in capital of $3,053 million. Total current liabilities are $748 million.

Queries:
1. What is the fiscal year 2024 financial summary?
Answer: The financial summary for Toast, Inc. for the fiscal year 2024, as of September 30, includes total assets of $2,227 million, total liabilities of $807 million, stockholders' equity of $1,420 million, current assets of $1,802 million, cash and cash equivalents of $761 million, an accumulated deficit of $1,636 million, additional paid-in capital of $3,053 million, and total current liabilities of $748 million.

Related issue number

closes #2078

Checks

  • [ ] I've included any doc changes needed for https://docs.ag2.ai/. See https://docs.ag2.ai/latest/docs/contributor-guide/documentation/ to build and test documentation locally.
  • [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
  • [ ] I've made sure all auto checks have passed.

priyansh4320 avatar Sep 02 '25 16:09 priyansh4320

πŸ“ Documentation Analysis

All docs are up to date! πŸŽ‰


βœ… Latest commit analyzed: 34358b4edf8076521045dc024b5055bd0f899999 | Powered by Joggr

joggrbot[bot] avatar Sep 02 '25 16:09 joggrbot[bot]

DocAgent Refactor

Current state of DocAgent

The existing DocAgent follows a swarm architecture with multiple specialized agents (Triage, Task Manager, Parser, Data Ingestion, Query, Error, and Summary agents). While this design provides clear separation of concerns, it introduces several production-readiness issues: Runtime Performance: Large document ingestion happens synchronously during user interactions, creating poor UX Resource Waste: New vector storage processes are created for every document, even if already processed Limited Storage Support: Only local file paths are supported, missing cloud storage capabilities Single RAG Backend: Limited to ChromaDB without enterprise alternatives like Weaviate or graph-based approaches

the new design will feature 4 layers:

  1. Query Layer: Handles user interactions and RAG queries
  2. Ingestion Layer: Processes documents asynchronously via events
  3. Storage Layer: Abstracts storage backends (local, cloud, blob)
  4. RAG Layer: Supports multiple RAG strategies (vector, structured, graph)

### How do we solve this problem?
  1. Event-Driven Ingestion

Instead of processing documents during runtime, the new architecture will use an event-driven approach, where documents will be ingested based on triggered events like button clicks, file uploads, etc.

# Before: Synchronous processing during query
user_query = "What's in this PDF?"
# Agent processes PDF β†’ chunks β†’ vectorizes β†’ stores β†’ queries (slow!)

# After: Event-driven ingestion
ingestion_service.ingest_document("large_report.pdf")  # Async event
# Later...
user_query = "What's in this PDF?"
# Agent queries pre-processed data (fast!)
  1. Decoupled Storage The storage layer will be separated from the query logic, this will allow users to configure cloud storage without changing the core agent logic.
@dataclass
class StorageConfig:
    storage_type: str = "local"  # "local", "s3", "azure", "gcs", "minio"
    base_path: Path = field(default_factory=lambda: Path("./storage"))
    bucket_name: str | None = None
    credentials: dict[str, Any] | None = None
  1. Multiple RAG Backends The new architecture supports three RAG strategies through a unified interface, add can be configured for any backend
@dataclass
class RAGConfig:
    rag_type: str = "vector"  # "vector", "structured", "graph"
    backend: str = "chromadb"  # "chromadb", "weaviate", "neo4j", "inmemory"
    collection_name: str | None = None
    embedding_model: str = "all-MiniLM-L6-v2"
  1. Configuration & Interfaces Unified Configuration The DocAgentConfig consolidates all settings in one place:
config = DocAgentConfig(
    rag=RAGConfig(
        rag_type="vector",
        backend="weaviate",
        embedding_model="all-MiniLM-L6-v2"
    ),
    storage=StorageConfig(
        storage_type="s3",
        bucket_name="my-docs-bucket"
    ),
    processing=ProcessingConfig(
        chunk_size=1024,
        max_file_size=500 * 1024 * 1024  # 500MB
    )
)

example usage

from autogen.agents.experimental.document_agent import DocAgent2, DocumentIngestionService
from autogen.agents.experimental.document_agent.core import DocAgentConfig

# Configure for production use
config = DocAgentConfig(
    rag=RAGConfig(backend="weaviate", rag_type="vector"),
    storage=StorageConfig(storage_type="s3", bucket_name="company-docs")
)

# Initialize query engine (supports multiple backends)
query_engine = WeaviateQueryEngine(config.rag)

# Create ingestion service (handles document processing)
ingestion_service = DocumentIngestionService(query_engine, config)

# Process documents asynchronously (event-driven)
ingestion_service.ingest_document("large_manual.pdf")  # Non-blocking

# Create query agent (fast, no document processing)
doc_agent = DocAgent2(
    query_engine=query_engine,
    config=config
)

# Query pre-processed documents
response = doc_agent.query("What are the safety procedures?")

todos:
  • [x] initial refactoring plan:
  1. Extract base interfaces from existing query engines
  2. Move document processing to separate ingestion module
  3. Simplify DocAgent to be query-only
  4. Create separate ingestion service using existing code

rough FS structure

document_agent/
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ base_interfaces.py          # Extract interfaces from existing code
β”‚   └── config.py                   # Configuration from existing code
β”œβ”€β”€ ingestion/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ document_processor.py       # Move from parser_utils.py + docling_doc_ingest_agent.py
β”‚   └── chunking_strategies.py      # Extract from existing parsing logic
β”œβ”€β”€ storage/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── local_storage.py            # Move from document_utils.py
β”œβ”€β”€ rag/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ base_rag.py                 # Extract from chroma_query_engine.py + inmemory_query_engine.py
β”‚   └── vector_rag.py               # Move chroma_query_engine.py
└── agents/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ doc_agent.py                # Simplified version of document_agent.py
    └── ingestion_agent.py          # Move from docling_doc_ingest_agent.py
  • [ ] step 2: We will add a Database Storage Layer add blob storage support (S3, Azure, GCS), Implement MinIO/DynamoDB bucket support and Creating storage abstraction layer

  • [ ] - [ ] step 4: Add structured RAG support: add postgresDBqueryengine support, implement structured query capabilities, create structured RAG strategy

  • [ ] step 5: We will add Graph RAG Backend , event based Knowledge Graph Creation support. add support for cypher queries support for data retrieval.

  • [ ] step 6: add unit test module for new DocAgent


The refactored DocAgent transforms from a research prototype into a production/enterprise-ready Ag2 feature with following benefits:
  • Performance: Query responses are instant since documents are pre-processed.
  • Scalability: Cloud storage support handles enterprise document volumes.
  • Flexibility: Multiple RAG backends for different use cases.
  • Maintainability: Clear separation of concerns and unified configuration.
  • Production Ready: Event-driven architecture supports real-world orchestrations.

priyansh4320 avatar Sep 02 '25 16:09 priyansh4320

@marklysze can you help review? Thank you!

qingyun-wu avatar Sep 03 '25 04:09 qingyun-wu

Codecov Report

:x: Patch coverage is 71.70418% with 88 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...tal/document_agent/ingestion/document_processor.py 28.57% 55 Missing :warning:
...ts/experimental/document_agent/agents/doc_agent.py 74.48% 24 Missing and 1 partial :warning:
...xperimental/document_agent/core/base_interfaces.py 82.60% 8 Missing :warning:
Files with missing lines Coverage Ξ”
...imental/document_agent/agents/ingestion_service.py 100.00% <100.00%> (ΓΈ)
.../agents/experimental/document_agent/core/config.py 100.00% <100.00%> (ΓΈ)
...xperimental/document_agent/core/base_interfaces.py 82.60% <82.60%> (ΓΈ)
...ts/experimental/document_agent/agents/doc_agent.py 74.48% <74.48%> (ΓΈ)
...tal/document_agent/ingestion/document_processor.py 28.57% <28.57%> (ΓΈ)

... and 41 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Oct 28 '25 16:10 codecov[bot]