OpenHands Remove LangChain and LlamaIndex dependency

What problem or use case are you trying to solve?

LangChain and LlamaIndex are large dependencies that requires additional dependencies. To attract more developers and to simplify development, OpenDevin should reduce external dependencies.

Do you have thoughts on the technical implementation?

Reimplement the specific functionality from both packages in OpenDevin. It is likely that the specific functionality used can be replaced by a few hundred lines of code.

Mar 30 '24 11:03 casper-hansen

LangChain is really close to being removed. We just need to redo the templates here: https://github.com/OpenDevin/OpenDevin/blob/main/agenthub/langchains_agent/utils/prompts.py

Would happily accept a PR there.

LlamaIndex is a little harder. I think we need an in-memory Vector database for long-term memory retrieval (though the current agent doesn't really utilize it). Not sure we want to implement embeddings/indexing/searching etc ourselves, but maybe there's something lighter weight than llamaindex. Maybe we can get away with raw chromadb?

Mar 30 '24 14:03 rbren

Removed Langchain from above file. Made PR here. Can work on removing llamaIndex next .

On another note, maybe we change "langchain Agent" to "ReAct Agent" --- pretty sure that's the framework they use.

Mar 31 '24 23:03 SuibhneOFoighil

Excellent work @SuibhneOFoighil

Apr 01 '24 09:04 casper-hansen

I also have https://github.com/OpenDevin/OpenDevin/pull/402 which renames to MonologueAgent

Apr 01 '24 14:04 rbren

@rbren JFYI: dev_config/python/mypy.ini mentions langchain_agent path exclude = agenthub/langchains_agent/regression

Apr 01 '24 16:04 omutist

Thanks! Fix: https://github.com/OpenDevin/OpenDevin/pull/513

Apr 01 '24 17:04 rbren

To update this thread: lanchain is now removed, llamaindex is still a dependency

Apr 01 '24 17:04 rbren

R2R is a very streamlined framework that is focusing on just deploying a RAG backend.

In one of my previous projects I spent some time thinking about symbol-based semantic code search over a graph.

If we were to re-implement this inside of R2R, would Devin consider using R2R as a RAG backend?

Apr 01 '24 18:04 emrgnt-cmplxty

I think the best options are:

llama-index (since it's the best-known RAG tool, does RAG very well out of the box, and is full of features)
raw vector database (which gives us more control)

Eventually we're going to want a really flexible RAG solution that agents can take advantage of without too much fuss. So I'm really leaning towards keeping LlamaIndex

Apr 02 '24 02:04 rbren

Going to close this one for now. Worth having a long-term discussion here, maybe in Slack

Apr 05 '24 18:04 rbren

OpenHands OpenHands copied to clipboard

Remove LangChain and LlamaIndex dependency

OpenHands
OpenHands copied to clipboard