OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Allow us to use MCP servers to extend OpenHand's functionality

Open orangejon opened this issue 1 year ago • 19 comments

What problem or use case are you trying to solve?

OpenHands' functionality is currently fairly limited, but Anthropic's MCP standard provides a way for LLMs to interact with many additional services and use them as "tools". This could allow for much more complex workflows, e.g. to use Puppeteer or Playwright to test the code in the browser, then if it fails use OpenAI o1 (via MCP) to debug/rewrite it, etc.

Describe the UX of the solution you'd like

I guess the ideal would be to be able to install MCP servers in one click or a prompt. The implementation in Cline is neat:

Screenshot 2024-12-24 at 17 56 56

... but the main thing is to be able to access them. Perhaps a list of the installed servers could be good to verify they have been recognised, like in Claude Desktop:

Screenshot 2024-12-24 at 13 53 01

Do you have thoughts on the technical implementation?

I don't know OpenHands' architecture, but please be sure to add clear documentation with step-by-step instructions so I know any setup that's required to use this functionality.

Describe alternatives you've considered

Using Claude Desktop instead of OpenHands, because I can probably replicate a lot of the same functionality by just combining MCP servers. But the UI probably wouldn't be as good and I'm not sure if it would work as effectively.

Additional context

orangejon avatar Dec 24 '24 12:12 orangejon

If it helps anyone, I could offer a small "bounty" payment for implementing this?

orangejon avatar Dec 24 '24 12:12 orangejon

I must first deal with my GUI & CLI issue, however next thing im planning is this one if no one else is interested.

UltraInstinct0x avatar Dec 24 '24 17:12 UltraInstinct0x

Agree that MCP servers in Openhands seems like a necessary table stake in the near future :) @orangejon Since you are mentioning Cline, have you considered using it as an alternative to Openhands and if so, where if anywere, does it fall short?

motin avatar Dec 25 '24 11:12 motin

I figured out GUI & CLI thing. I am working on this right now. @orangejon do you think users should need to add / remove tool by themselves or should OpenHands figure out what kind of tools it might utilize and install them. I think the latter option is better however we might need to add steps to approve / reject tool installation and usage just like Claude app. Any thoughts? Also @motin asked a great question, can you elaborate on that please?

UltraInstinct0x avatar Dec 26 '24 14:12 UltraInstinct0x

I think it could be either, or even both. The way Cline does it with describing a tool by its capability seems ideal, because then I don't have to search online for a suitable tool first. I agree that in this case a confirmation step is probably worthwhile, especially if there are multiple tools that match. I suppose if I have a particular tool in mind then it would be good to be able to just give the name or URL - though I guess that could also be via a prompt. Removing might be easier to just click a button on a list of installed servers though? But any way is fine for me really, so long as there's a reasonably easy and clearly documented way to use MCP servers then I'll figure it out :)

I've not use Cline much yet, but I've got it installed and will be experimenting with it over the next few days. I'll report back!

orangejon avatar Dec 26 '24 16:12 orangejon

@motin I've had a chance to use Cline for a few days now, so I can report back my initial experience. So far I used it to create a simple Ruby on Rails web app. Because MacOS really creates headaches with Ruby versions (which Cline tried to find solutions to for over an hour, but only succeeded temporarily), I decided to use a Github codespace (basically a VPS running Ubuntu) and connect Vscode to that, which works really well - effortless setup, fast, reliable, and automatically configures port forwarding so you can see your web app as if it was running on your local machine. This had the nice side effects that it runs faster as the VPS has more resources than my laptop and I don't have to worry about the terminal commands that Cline runs, as the worst case scenario would be wasting a few minutes rebuilding the virtual machine if it really screwed it up (which, so far, it didn't).

The code generated is pretty decent when using Claude Sonnet 3.5 (via OpenRouter) but my attempt to use Gemini was pretty unsuccessful, hitting various errors regardless of which model I selected. Claude can use MCP tools but it doesn't seem to do so unless you directly tell it to in the user prompt; e.g. I added to the "system" prompt that it when encountering an error it should use the search1api MCP client to read the documentation, but it never did. At least it (usually) listened to my instruction to run all unit tests and a Playwright browser-based integration test, so it does usually catch its own errors and fix them before asking for user input. I just swear it would be faster and burn less tokens if it Googled for a solution or documentation rather than just randomly changing the code, sometimes even call functions that don't exist.

Also, although the documentation sounds like you can just prompt Cline with "Add a tool that..." and it will install the correct tool, that's not what it does. Typing that prompt seems to create a new MCP client from scratch which, seeing as it doesn't read the API documentation, is very unlikely to actually work! Instead you have to search for the "configure MCP servers" dialog, which then makes you manually edit the JSON configuration file to insert the MCP client definition. Then it works fine (and displays a nice "status" thing on the dialog with the various functions you can call, like in Claude Desktop) but it's a bit of a faff. I'd rather just paste in the URL of an MCP client definition and it adds it for me.

Still, I have to say, my initial impression of Cline is generally pretty positive. The areas where it falls short currently are:

  • speed, because it often takes multiple attempts to fix a bug and it seems to be entirely single-threaded; I realise this helps avoid code conflicts but surely there could be background agents (e.g. crawling the latest documentation from the web) and multiple Cline instances that I could task to work on different areas of the codebase, just as I do with human developers. (Though maybe I can just open duplicate Vscode workspaces? I'll try that...)
  • asks too frequently for human confirmation before running commands; it has never proposed a command I rejected so now I just instinctively hit "accept" without reading it... and as this a VPS anyway, I don't really care
  • the UIs it creates are ugly. I tried to get it to use the Material Design library and search online for templates but without much success so far. I'll keep trying, but I suspect that this will be one area where it's currently easier for a human to just edit the output until it looks okay, and/or set up a CSS template that Cline can use in future. Or maybe I can find another AI tool that's better at this.
  • the image upload feature seems to be broken. I'll submit a bug report. I guess if this worked I might be able to upload a screenshot of a site and ask it to copy the styling.

If you've got an questions, let me know. I'm still a fan of the OpenHands approach, and FOSS in general, so I'm happy to help if I can. It's just that Cline is working well for me so I will stick to using it for the time being.

orangejon avatar Dec 29 '24 12:12 orangejon

It would be great to have MCP/RAG built-in options for popular knowledge bases like wikis, Confluence, PDFs, and web links.

Key benefits of this feature:

  1. Enhanced knowledge retrieval capabilities
  2. Improved integration with common information sources
  3. Increased efficiency in accessing relevant data

Potential implementation ideas:

Develop connectors for popular wiki platforms and Confluence

  1. Implement PDF parsing and indexing functionality
  2. Create a system for crawling and updating web link content
  3. This enhancement would significantly expand OpenHands' ability to leverage existing knowledge repositories, making it more versatile and powerful for users working with various information sources.

Would love to hear your thoughts on this suggestion!

AlexanderDorofeev avatar Jan 09 '25 21:01 AlexanderDorofeev

I think the point of MCP is not that these types of functionality are "built in", the idea is that you can you add MCP clients for whatever you need, then OpenHands will access whatever it needs.

(At least in theory; I've been using Cline + Claude Sonnet 3.5 recently, which supports MCP, and it rarely ever uses any MCP clients, no matter how much I prompt it to!)

On Thu, 9 Jan 2025 at 23:46, Alexander Dorofeev @.***> wrote:

It would be great to have MCP/RAG built-in options for popular knowledge bases like wikis, Confluence, PDFs, and web links.

Key benefits of this feature:

  1. Enhanced knowledge retrieval capabilities
  2. Improved integration with common information sources
  3. Increased efficiency in accessing relevant data

Potential implementation ideas:

Develop connectors for popular wiki platforms and Confluence

  1. Implement PDF parsing and indexing functionality
  2. Create a system for crawling and updating web link content
  3. This enhancement would significantly expand OpenHands' ability to leverage existing knowledge repositories, making it more versatile and powerful for users working with various information sources.

Would love to hear your thoughts on this suggestion!

— Reply to this email directly, view it on GitHub https://github.com/All-Hands-AI/OpenHands/issues/5781#issuecomment-2581304480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWZJGWJPW3HQ3WIIT7EPR332J3U2FAVCNFSM6AAAAABUETS6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBRGMYDINBYGA . You are receiving this because you were mentioned.Message ID: @.***>

orangejon avatar Jan 10 '25 11:01 orangejon

Agreed, integrating MCP is essential. However, how can we design MCP usage to be model-agnostic? It doesn’t seem like a good approach to develop a feature that only works with Claude 3.5/Sonnet.

RPirruccio avatar Jan 12 '25 18:01 RPirruccio

MCP is (at least theoretically) an open standard that other LLMs can implement. As far as I know it's only Anthropic's models than implemented it so far, though.

orangejon avatar Jan 13 '25 07:01 orangejon

I think there is a way to implement a middleware like MCP-Bridge by https://github.com/SecretiveShell/MCP-Bridge, which main idea is to provide an openAI compatible endpoint that can call MCP tools @orangejon. However, whether it is appropriate still requires in-depth evaluation.

Sucran avatar Jan 14 '25 09:01 Sucran

I've been looking into this and I think we can implement this pretty easily. We already use LiteLLM which lets us do tool/function calling with any model - just like how librechat handles tools globally (they do it with langchain) regardless of which provider or model you're using. @RPirruccio - good point about model agnostic design. That's exactly why we don't need MCP-Bridge here - LiteLLM already handles the compatibility layer for us. Before I start implementing this, I'd like to hear from everyone:

How do you prefer to add tools - should users manually configure them, or should OpenHands try to discover and suggest relevant tools? Should we add an approval step for tool installation like Claude Desktop does?

I'm leaning towards automated discovery with approval prompt since it would make things easier for users while keeping them in control. But let me know what you think would work best for your use cases. We may also need to configure headless version to be able to configure its own tools but IMO those tools available to it should be limited for preventing any skynet becomes self aware moment.

UltraInstinct0x avatar Jan 16 '25 07:01 UltraInstinct0x

Automated discovery sounds great if it works well, because then if I'm coding something and realise I need a tool (or the LLM realises?) then I don't have to go off to search the web for a solution. However, if there are multiple MCP tools then it might be preferable to select one manually.. not necessarily for fear of skynet situations but more because some of the MCP tools are pretty flakey!

Also there's the case that's been more common for me so far: I'm browsing the web looking for tools that can improve my workflow, and want to add one that I've found. So it's not necessarily something that is essential in that moment or that OpenHands (or Cline) can't operate without, but it's something that seems generally useful (e.g. web search and scraping). Also most tools need me to create an account, add payment details and get an API key, so unless OpenHands will do that automatically, there's not a significant benefit in the discovery step happening automatically.

In short, being able to install (and uninstall) tools manually would certainly be useful, and presumably it's easier to implement, so it might make sense to add that first.

On Thu, 16 Jan 2025 at 09:53, Goku @.***> wrote:

I've been looking into this and I think we can implement this pretty easily. We already use LiteLLM which lets us do tool/function calling with any model - just like how librechat handles tools globally (they do it with langchain) regardless of which provider or model you're using. @RPirruccio https://github.com/RPirruccio - good point about model agnostic design. That's exactly why we don't need MCP-Bridge here - LiteLLM already handles the compatibility layer for us. Before I start implementing this, I'd like to hear from everyone:

How do you prefer to add tools - should users manually configure them, or should OpenHands try to discover and suggest relevant tools? Should we add an approval step for tool installation like Claude Desktop does?

I'm leaning towards automated discovery with approval prompt since it would make things easier for users while keeping them in control. But let me know what you think would work best for your use cases. We may also need to configure headless version to be able to configure its own tools but IMO those tools available to it should be limited for preventing any skynet becomes self aware moment.

— Reply to this email directly, view it on GitHub https://github.com/All-Hands-AI/OpenHands/issues/5781#issuecomment-2594754325, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWZJGWKYG25RSVGZ4QHNRR32K5QRLAVCNFSM6AAAAABUETS6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJUG42TIMZSGU . You are receiving this because you were mentioned.Message ID: @.***>

orangejon avatar Jan 16 '25 08:01 orangejon

Ok I'm working on it.

UltraInstinct0x avatar Jan 16 '25 21:01 UltraInstinct0x

Interesting conversation here! I just want to add a few ideas based on my experience crafting coding agents using MCP with Claude Desktop.

There’s a belief that adding more tools makes LLMs smarter, but more often, it just creates confusion and fills the context with noise. In my opinion, additional tools should be part of a message and include extra in-context learning materials. Since I can’t do this with Claude Desktop, I started looking for alternatives.

A few ideas worth exploring:

  • Augmenting microagents with tools (MCP), so a tool is only exposed when a relevant microagent is triggered.
  • Using a separate reasoning thread where the LLM first determines if any tools can assist with the user’s request, and if so, dynamically adds them to the chat (like RAG for tools).
  • Leveraging CLI-based tools instead of complex integrations. Claude is very effective at exploring codebases with rg (ripgrep) and doing API testing with httpie. I initially used [argc](https://github.com/sigoden/argc) to wrap APIs into CLIs and expose them when needed as MCP servers with [llm-functions](https://github.com/sigoden/llm-functions/tree/main/mcp/server). However, I now realize that All-Hands already provides built-in shell access and the ability to create bespoke sandboxes with custom CLIs. Microagents seem like a perfect fit for adding extra in-context learning. After writing this, I’m keen to try it myself—everything needed is already there!

This also makes me think: the shell is an underutilized platform for LLM tools. It’s straightforward to provide RAG, web search, and many other functions via CLIs. If an LLM can call native tools, why wouldn't it use shell-based tools with the same efficiency?

I’d love to hear critical opinions on this approach. What am I missing? Are there hidden downsides?

anzax avatar Feb 04 '25 19:02 anzax

I think MCP is cool and we can benefit from adding it to openhands.

Just to note quickly, @anzax I do agree.

  • Re: Augmenting microagents with tools (MCP) - we need MCP first IMO, once MCP is integrated, we can already use this I think
  • Re: Using a separate reasoning thread - underlying support for reasoning llm and workflow is coming
  • Re: Leveraging CLI-based tools - we had what we call agent skills, implemented in python, ran via Jupyter server in the runtime. We consider some of them deprecated right now. I don't know how new tools would look like, but if you want to try it, please feel free to!

enyst avatar Feb 04 '25 20:02 enyst

I switched to using Cline (with Claude Sonnet 3.5) mostly because it supports MCP tools, but I have been disappointed how infrequently it uses them. I suspect it's because the LLM was mainly trained on content like StackOverflow, where people offer solutions to the problem the developer is currently facing. These solutions don't often say "Now Google for the latest API documentation and check your API calls are correct" because the solutions offered were correct at the time of writing. I guess having a separate reasoning thread might help, because, if prompted appropriately, it could encourage the LLM to first plan out how to approach a problem methodically, as a good developer would, instead of just randomly making changes that are just as likely to cause new problems as to fix the bug!

On Tue, 4 Feb 2025 at 22:27, Engel Nyst @.***> wrote:

I think MCP is cool and we can benefit from adding it to openhands.

Just to note quickly, @anzax https://github.com/anzax I do agree.

— Reply to this email directly, view it on GitHub https://github.com/All-Hands-AI/OpenHands/issues/5781#issuecomment-2634989048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWZJGWLYPHOVFMGZRAANA6T2OEPC3AVCNFSM6AAAAABUETS6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZUHE4DSMBUHA . You are receiving this because you were mentioned.Message ID: @.***>

orangejon avatar Feb 05 '25 08:02 orangejon

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Mar 08 '25 01:03 github-actions[bot]

Is this being implemented? I just POC'd MCP tool use at my company for internal workflows. Would love to leverage OpenHands here

dor-tzur-lmnd avatar Mar 12 '25 08:03 dor-tzur-lmnd

@ryanhoangt is taking a look at this!

neubig avatar Mar 24 '25 19:03 neubig

Can be closed, right?

MischaPanch avatar Apr 19 '25 16:04 MischaPanch

Yes, thank you for the reminder, this was solved by #7637 .

enyst avatar Apr 19 '25 17:04 enyst