ml-commons icon indicating copy to clipboard operation
ml-commons copied to clipboard

[FEATURE] Deep Research

Open pyek-bot opened this issue 8 months ago • 1 comments

Is your feature request related to a problem? Agent framework in ml-commons is not capable of solving complex tasks that require multiple steps and tools to execute. Moreover, the execution is synchronous and not suitable for long running tasks. Tools execute sequentially and do not offer the capability to search the web preventing the ability to solve ambiguous tasks that require more context.

Currently we have three types of agent (doc):

  1. Flow agent
  2. Conversational flow agent
  3. Conversational agent

These existing agent types have limitations in supporting complex logic. Specifically, they lack the ability to:

  1. Implement conditional workflows
  2. Execute tools in parallel
  3. Handle branching and merging of execution paths
  4. Manage dependencies between tasks
  5. Asynchronous execution

This limitation restricts the creation of more sophisticated and efficient workflows within OpenSearch.

With the introduction of deep research in service providers like OpenAI, Gemini, Perplexity, etc, there is a need for such an agent capable of breaking down a task into simple steps and executing them with the help of the provided tools asynchronously.

It can also help save costs by invoking cheaper & faster LLMs for smaller tasks and reducing the number of inferences required.

Examples:

  1. Root Cause Analysis (RCA) for an Error:
  • Identify the source of recurring 500 Internal Server Error logs.
  • Retrieve related warnings, deployment changes, or traffic anomalies.
  • Generate a summary with potential causes and recommended actions.
  1. Log Anomaly Detection:
  • Analyze the last 24 hours of logs to detect spikes.
  • Identify affected endpoints, services, or patterns.
  • Correlate findings with recent system changes.

What solution would you like? A new Deep Research Agent in OpenSearch ML Commons that can:

  1. Break down complex tasks into simpler steps
  2. Use appropriate tools dynamically for each step
  3. Execute tasks serially or in parallel, depending on dependencies
  4. Support function calling to leverage external services efficiently Ref: https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview
  5. Re-evaluate progress and refine execution based on intermediate results
  6. Handle failures intelligently, including retries, fallbacks, and logging
  7. Execute the task asynchronously and update the status accordingly
  8. Incorporate web search capabilities where external context is required

There is a feature request for a graph agent already that tries to address the problem: https://github.com/opensearch-project/ml-commons/issues/3309

However, this feature focusses more on the automatic breakdown of complex tasks and tool execution rather than only a DAG style execution of tools.

pyek-bot avatar Mar 13 '25 23:03 pyek-bot

Suggestion:

  1. Add architecture and workflow diagram
  2. Split into phases for faster delivery

ylwu-amzn avatar Mar 14 '25 04:03 ylwu-amzn

RFC: https://github.com/opensearch-project/ml-commons/issues/3745 Framework to internally support deep-research

pyek-bot avatar Apr 20 '25 23:04 pyek-bot

We can close this issue as it has been merged and is now called the PlanExecuteReflect Agent which with tools behaves as a deep research agent.

pyek-bot avatar Aug 14 '25 00:08 pyek-bot