[Feature Request]: Refactor DocAgent to be enterprise ready

Open priyansh4320 opened this issue 3 months ago • 1 comments

Is your feature request related to a problem? Please describe.

The Current DocAgent has a few architectural problems which makes is unsuitable to be used in an Enterprise ready environment.

The current DocAgent is a swarm with these generated agents:

    - Triage Agent: responsible for deciding what type of task to perform from user requests.
    - Task Manager Agent: responsible for managing the tasks.
    - Parser Agent: responsible for parsing the documents.
    - Data Ingestion Agent: responsible for ingesting the documents.
    - Query Agent: responsible for answering the user's questions.
    - Error Agent: responsible for returning errors gracefully.
    - Summary Agent: responsible for generating a summary of the user's questions.

The Workflow of DocAgent is as follows:

Initialization: The DocAgent initializes the swarm agents and sets up the context variables.
Triage User Requests: The Triage Agent categorizes the tasks into ingestions and queries.
Task Management: The Task Manager Agent manages the tasks and ensures they are executed in the correct sequence.
Data Ingestion: The Data Ingestion Agent processes the documents.
Query Execution: The Query Agent answers the user's questions.
Summary Generation: The Summary Agent generates a summary of the completed tasks.

The Problem:

The existing DocAgent follows a swarm architecture with multiple specialized agents (Triage, Task Manager, Parser, Data Ingestion, Query, Error, and Summary agents). While this design provides clear separation of concerns, it introduces several enterprise-readiness issues:

Runtime Performance Issues Large document ingestion operations execute synchronously during user interactions, blocking the UI and creating significant performance bottlenecks. Users experience lengthy wait times during document processing, leading to poor user experience and potential timeouts in production environments. The lack of asynchronous processing pipelines means that complex document parsing, chunking, and embedding generation all happen on the main thread, severely impacting application responsiveness.
Inefficient Resource Management The system instantiates new vector storage processes and embedding pipelines for every document ingestion request, regardless of whether identical documents have been previously processed. This approach wastes computational resources, increases memory footprint, and leads to duplicate vector representations in storage. There's no caching mechanism or content-based deduplication to prevent redundant processing of the same documents across different user sessions or workflows.
Constrained Storage Architecture The current implementation is limited to local file system paths, preventing integration with modern cloud storage solutions like AWS S3, Google Cloud Storage, or Azure Blob Storage. This restriction limits scalability, prevents distributed deployment scenarios, and makes it difficult to handle enterprise-scale document repositories. The absence of remote storage connectors also impacts collaboration capabilities and multi-tenant architectures.
Monolithic RAG Backend Dependencies The system is tightly coupled to ChromaDB as the sole vector database option, lacking support for enterprise-grade alternatives such as Weaviate, Pinecone, or Qdrant. This limitation prevents organizations from leveraging existing vector database infrastructure and eliminates the possibility of implementing hybrid retrieval strategies. Additionally, there's no support for graph-based knowledge retrieval approaches that could enhance contextual understanding through entity relationships and semantic connections.

Describe the solution you'd like

No response

Additional context

No response

Sep 02 '25 16:09 priyansh4320

General comments on the DocAgent.

What features should AG2 maintain and what features should AG2 integrate with 3rd party?
The scope of DocAgent is to solve common RAG use cases instead of tackling hard RAG problems.

A few current issues of DocAgent,

It is swarm based and should be updated to group chat.
It is not async that ingestion tasks would block query tasks.
It is not using the latest LLM multi-model capabilities.

A few RAG frameworks/examples we could look into https://github.com/infiniflow/ragflow https://github.com/SciPhi-AI/R2R https://github.com/Joshua-Yu/graph-rag/

@priyansh4320 @marklysze

Sep 05 '25 21:09 randombet