OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Proposal: Simplify microagents + support MCP natively

Open neubig opened this issue 9 months ago β€’ 17 comments

Summary

There are two issues:

  1. Microagents are hard to understand, let's consider how to fix this.
  2. It would be good to consider first-class support for MCP within microagents.

This issue proposes a way to fix this.

Technical Design

Here is a proposal of documentation, and if this looks good we can modify the implementation accordingly.


Micro-agents

A micro-agent is a way that we allow the OpenHands main agent to be customized. The agent can be triggered on-the-fly, giving the main OpenHands agent additional abilities or instructions that it does not have typically.

This doc describes:

  1. How micro-agents can be defined
  2. The syntax used to define micro-agents
  3. Where you put micro-agents so they can be discovered by OpenHands

Micro-agent Definition

A micro-agent is defined by three aspects:

  1. Its trigger
  2. Its instruction
  3. Its additional tools

Micro-agent Trigger Types

Micro-agents are "triggered" by different events, and when the trigger occurs, the micro-agent will be activated. Currently OpenHands supports three varieties of trigger_type.

  • always: a micro-agent with the "always" trigger will always be activated. These sorts of agents can be useful when you want to specify
    • Repository-wide coding conventions. In this case, the micro-agent can be placed in the repo (see below for details) and will always be triggered every time OpenHands works on that repo. Here is the example from OpenHands.
    • Organization-wide coding conventions. In the future, we are planning on implementing support for micro-agents that specify coding conventions across an entire organization. If you are interested in this functionality, please thumbs-up or comment on this issue and we will work on implementing it.
  • keyword: a micro-agent will be triggered when a particular keyword appears in the conversation. For instance, here is an example of a micro-agent that is triggered when the agent says "github", telling OpenHands how to interact with GitHub through the API.
  • manual: These microagents can only be triggered through manual intervention by the user in the OpenHands interface (or other programmatic means). This feature is particularly useful for microagents that describe how to solve a task, such as todo.

Micro-agent Instructions

Micro-agent instructions are an additional prompt that is provided to the agent when the microagent is triggered. They basically provide additional information that modifies the agent behavior in the appropriate way. They are written in English (or whatever other language you work in). You can see examples in the OpenHands micro-agent directory.

Micro-agent Tools

Triggering micro-agents can optionally provide OpenHands with additional tools.

In the case that additional tools are provided, they are specified through MCP. This is done by providing a location of an MCP server that is used to read in and access the API. The API information will automatically be provided to the agent, so you do not need to specifically enumerate all of the functions in the API.

For instance, here is an example of a tool that provides access to TODO.

Micro-agent Syntax

All micro-agents use markdown files with YAML frontmatter.

---
name: <Name of the microagent>
trigger_type: <always, keyword, or manual>
keywords:
- <Optional keywords only active when `trigger_type` is "keyword">
mcp_location: <Optional location of an MCP server that provides additional tools>
---

<Markdown with any special guidelines, instructions, and prompts that OpenHands should follow.
Check out the specific documentation for each microagent on best practices for more information.>

Micro-agent Location

Micro-agents are located in several places:

  • Public micro-agents: These are micro-agents that are included in the OpenHands main repo here. They are meant to be general and widely usable by many different people or organizations, and document best practices on how to use OpenHands in general.
  • Repository micro-agents: These can be included directly in the repo that OpenHands is working on. These micro-agents should added in the .openhands/microagents/ directory.
  • Organization micro-agents: We are working on implementing micro-agents that are easily accessible at a cross-repository organizational level, so please comment on this issue if this would be useful to you.

If you find this feature request or enhancement useful, make sure to add a πŸ‘ to the issue

neubig avatar Mar 27 '25 15:03 neubig

I think it would be substantially better to use Model Context Protocol as a standard rather than managing your own.

kjenney avatar Mar 27 '25 16:03 kjenney

Organization-wide microagents are an essential feature for enterprise adoption. +1 to that.

caique avatar Mar 27 '25 16:03 caique

I'm not entirely convinced we should mix microagents and MCPs.

I like the idea of supporting both but I'd delegate the decision to use a MCP to the LLM rather than attach to the microagents syntax.

In addition, can't we just refer to the MCP in the microagent content? How do you envision the frontmater processing for this particular field?

caique avatar Mar 27 '25 16:03 caique

In addition, can't we just refer to the MCP in the microagent content? How do you envision the frontmater processing for this particular field?

I think mcp_location is the path of the config file for the MCP server? We can also provide more info about it in the micro-agent content.

I'd delegate the decision to use a MCP to the LLM rather than attach to the microagents syntax.

Can you elaborate a bit about this?

ryanhoangt avatar Mar 27 '25 16:03 ryanhoangt

Can you elaborate a bit about this?

I prefer sending MCPs as "available tools" to the LLM and not tied to any specific microagents.

We continue handling the activation of microagents through the existing triggers (always, keyword, or manual).

Then, we delegate the decision to use a MCP or not to the LLM instead of attaching it to a specific microagent trigger.

Makes sense?

caique avatar Mar 27 '25 17:03 caique

mcp_location could be attached to a microagent triggered with "always", in which case the tools are always available. Would that work?

neubig avatar Mar 27 '25 17:03 neubig

mcp_location could be attached to a microagent triggered with "always", in which case the tools are always available. Would that work?

That would be the way to register the MCPs? Instead of a mcp.json (or another config file), we would use an "always" microagent? 😊

caique avatar Mar 27 '25 17:03 caique

Yes, it would be the microagent for the whole repo, like repo.md currently. The idea would be that all repo-level customization would go in that file, and the file format would be the same as the other microagent files.

The advantage of this method is that microagents all come in the same formats, and mcps can either be registered always or only when certain triggers happen.

neubig avatar Mar 27 '25 17:03 neubig

The advantage of this method is that microagents all come in the same formats, and mcps can either be registered always or only when certain triggers happen.

Makes sense to me now!

This would allows us to use MCPs without microagents (and vice-versa) but also create combinations between both feature for advanced use-cases!

caique avatar Mar 27 '25 17:03 caique

Micro-agent Syntax All micro-agents use markdown files with YAML frontmatter.


name: <Name of the microagent> trigger_type: <always, keyword, or manual> keywords:

  • <Optional keywords only active when trigger_type is "keyword"> mcp_location: <Optional location of an MCP server that provides additional tools>

<Markdown with any special guidelines, instructions, and prompts that OpenHands should follow. Check out the specific documentation for each microagent on best practices for more information.>

Hi @neubig, does the above apply to knowledge/ microagents too? The documentation examples on the codebase still use different YAML frontmatter keys of name, type, agent, triggers.

oconnorjoseph avatar Mar 27 '25 19:03 oconnorjoseph

I'm thinking about what we can do to support task better. One way would be to prod the LLM to use a task.md to keep track of where it's at, e.g. breakdown the task, make itself a checklist with the steps, check the boxes as it fulfills them.

In some cases, this has worked very well to keep Claude Sonnet (and not only) focused, as it went through all of them.

enyst avatar Mar 27 '25 19:03 enyst

@neubig : Created a ticket for the org wide feed back that you mentioned. https://github.com/All-Hands-AI/OpenHands/issues/7557

Thinking aloud:

Naming and Framing Micro Agents, should be OpenHands Agent Customization The terminology for Repo and Keyword Micro-Agents really should be something like Agent Directives or Agent Context. Ex Repo Agent Directives, Keyword Directives. They allow users to add more general knowledge to the agent and reduce writing longer prompts.

MCP and Task Tasks (Single Step & Multi Step) : They map to planning and user actions. Can be setup for something simple like steps around a git commit to something more advanced like running end to end tests for an application, building out any missing results and sending an email when complete. Also Tasks are good wrappers for MCP or external tools.

Todo It would be good to map out a few user scenarios and look for gaps. "Organization micro-agents" add another layer of complexity. For a smaller group a centralized repo that stores Best in Class configurations and agent scripts would work. For a larger org they would be standardizing on authorized workflows and templates.

jasonburt avatar Mar 27 '25 19:03 jasonburt

+1 to @jasonburt thoughts!

Naming and Framing Micro Agents, should be OpenHands Agent Customization The terminology for Repo and Keyword Micro-Agents really should be something like Agent Directives or Agent Context. Ex Repo Agent Directives, Keyword Directives. They allow users to add more general knowledge to the agent and reduce writing longer prompts.

That is a great opportunity for us to find better names for these concepts.

IMHO "Microagents" is a very misleading name. First time I read it, I thought OpenHands would start new agent instances to delegate tasks too. πŸ˜…

Adding to it, the "Repository-specific microagents" are ambiguous to the "repo.md microagent" which has the repo type.

MCP and Task Tasks (Single Step & Multi Step) : They map to planning and user actions. Can be setup for something simple like steps around a git commit to something more advanced like running end to end tests for an application, building out any missing results and sending an email when complete. Also Tasks are good wrappers for MCP or external tools.

I like the idea of step-by-step tasks and workflows for common tasks. Despite the whole LLM UX be based on natural language, it is super annoying to see the agent do 10 attempts to commit and push.

I have not noticed anything in other tools and I know that I would appreciate having the ability to define a checklist that I can trigger in a conversation.

caique avatar Mar 27 '25 20:03 caique

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 15 '25 02:05 github-actions[bot]

Is this just Task Master, but for OpenHands? Consider how they can break down tasks into smaller chunks... but they are still missing something with memory or deep/online research https://github.com/eyaltoledano/claude-task-master https://github.com/eyaltoledano/claude-task-master/discussions/487 https://github.com/eyaltoledano/claude-task-master/discussions/499

BradKML avatar May 15 '25 02:05 BradKML

I think a fair amount of this has been done already. Microagents have been simplified I believe and MCP was implemented. Is there anything missing?

mamoodi avatar May 15 '25 13:05 mamoodi

@mamoodi mostly about test-running some of the popular MCPs to see if there are any missing features OR limitations on certain MCP designs

BradKML avatar May 29 '25 08:05 BradKML

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jun 29 '25 02:06 github-actions[bot]

  • manual: These microagents can only be triggered through manual intervention by the user in the OpenHands interface (or other programmatic means). This feature is particularly useful for microagents that describe how to solve a task, such as todo.

I think we have implemented this as β€˜knowledge’ microagents currently, so the trigger is a keyword anywhere in the user message. The actual triggers we use start with β€˜/β€˜. Maybe we have one more thing here: when the user starts with β€˜/β€˜, we could autocomplete.

In the case that additional tools are provided, they are specified through MCP. This is done by providing a location of an MCP server that is used to read in and access the API. The API information will automatically be provided to the agent, so you do not need to specifically enumerate all of the functions in the API.

I think we haven’t yet implemented this, unless I missed it among MCP PRs. 😊

enyst avatar Jun 29 '25 06:06 enyst

Oh yeah I forgot to say that MCP management is another level that is worth looking into (sorry for the reference, but they have public bots ready for use) https://github.com/RooCodeInc/Roo-Code/discussions/6289#discussioncomment-13928740

BradKML avatar Jul 30 '25 02:07 BradKML

This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.

github-actions[bot] avatar Sep 09 '25 02:09 github-actions[bot]

Should this be taken down when we have MCP already? What are the last parts separating MCP from current form to "native"?

BradKML avatar Sep 18 '25 07:09 BradKML

This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.

github-actions[bot] avatar Oct 29 '25 02:10 github-actions[bot]

This issue was automatically closed due to 50 days of inactivity. We do this to help keep the issues somewhat manageable and focus on active issues.

github-actions[bot] avatar Nov 09 '25 02:11 github-actions[bot]