OpenHands
OpenHands copied to clipboard
Remove LangChain and LlamaIndex dependency
What problem or use case are you trying to solve?
LangChain and LlamaIndex are large dependencies that requires additional dependencies. To attract more developers and to simplify development, OpenDevin should reduce external dependencies.
Do you have thoughts on the technical implementation?
Reimplement the specific functionality from both packages in OpenDevin. It is likely that the specific functionality used can be replaced by a few hundred lines of code.
LangChain is really close to being removed. We just need to redo the templates here: https://github.com/OpenDevin/OpenDevin/blob/main/agenthub/langchains_agent/utils/prompts.py
Would happily accept a PR there.
LlamaIndex is a little harder. I think we need an in-memory Vector database for long-term memory retrieval (though the current agent doesn't really utilize it). Not sure we want to implement embeddings/indexing/searching etc ourselves, but maybe there's something lighter weight than llamaindex. Maybe we can get away with raw chromadb?
Removed Langchain from above file. Made PR here. Can work on removing llamaIndex next .
On another note, maybe we change "langchain Agent" to "ReAct Agent" --- pretty sure that's the framework they use.
Excellent work @SuibhneOFoighil
I also have https://github.com/OpenDevin/OpenDevin/pull/402 which renames to MonologueAgent
@rbren
JFYI: dev_config/python/mypy.ini mentions langchain_agent path
exclude = agenthub/langchains_agent/regression
Thanks! Fix: https://github.com/OpenDevin/OpenDevin/pull/513
To update this thread: lanchain is now removed, llamaindex is still a dependency
R2R is a very streamlined framework that is focusing on just deploying a RAG backend.
In one of my previous projects I spent some time thinking about symbol-based semantic code search over a graph.
If we were to re-implement this inside of R2R, would Devin consider using R2R as a RAG backend?
I think the best options are:
- llama-index (since it's the best-known RAG tool, does RAG very well out of the box, and is full of features)
- raw vector database (which gives us more control)
Eventually we're going to want a really flexible RAG solution that agents can take advantage of without too much fuss. So I'm really leaning towards keeping LlamaIndex
Going to close this one for now. Worth having a long-term discussion here, maybe in Slack