langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Integrate Neo4j as a Graph Index, Vector Index, and as tools in the ecosystem

Open quillan86 opened this issue 1 year ago • 4 comments

Feature request

There is a need for graph databases to be integrated in langchain. NetworkX isn't suitable for scalable graph databases that would be desired to be queried, particularly with tens of thousands or more nodes and edges. This is necessary for graph databases to compete with vector databases on the level for information extraction within langchain.

There is already a medium article and GitHub repo talking about one way in which this is implemented, but it would be ideal if something like this was integrated into langchain itself. This implementation also has Neo4j as embeddings as an option, which should be implemented as well.

Motivation

The Graph Index Creator and other small forms of graphs within LangChain use NetworkX which isn't scalable for production for full blown knowledge graphs on the size of the vector databases. I know that I have a particular need to use a graph database in production along with langchain due to a work level project.

Your contribution

Yes, I am willing to contribute. I haven't contributed to LangChain directly before but I am familiar with the source code investigating it. Would love to collaborate on what kind of framework/interface we would need to expand graph indexes with a similar scope as vector database indexes.

quillan86 avatar May 13 '23 13:05 quillan86

I would also be willing to contribute, I would just need a bit of help to know where to put the code? The closest sections seems vector store, but Neo4j is not a vector store, so should it be a retrieval or a tool, or do we just pretend neo4j is a vector store?

tomasonjo avatar May 17 '23 11:05 tomasonjo

There's already these folders of relevance:

  • langchain.graphs (storing a barebones networkx graph)
  • langchain.indexes (storing graphs.py for the GraphIndexCreator much liek the VectorStore Creator)
  • langchain.chains.graph_qa (storing a chain for graph QA)

So I think it's a matter of reformulating langchain.graphs to have a base.py et al similar to langchain.vectorstores. That's why I said interface - it would be the creation of a new general object like Vectorstore.

We can possibly store the vector embedding portion of Neo4j within the vectorstore one, though, but I'd need to look at the code based on the medium article.

I've already forked the repo and created a branch on my end for this although I haven't pushed changes yet.

quillan86 avatar May 17 '23 13:05 quillan86

Yeah, I wouldn't really add vector search in Neo4j for starters, I would try to add Cypher search first, something like schema based cypher generation, that can be used on any graph:

https://medium.com/neo4j/generating-cypher-queries-with-chatgpt-4-on-any-graph-schema-a57d7082a7e7

tomasonjo avatar May 17 '23 13:05 tomasonjo

Yeah that wouldn't be a priority atm (other than that was a feature of the agent tools I mentioned earlier) - cypher search would be the priority.

quillan86 avatar May 17 '23 14:05 quillan86

I've started the PR, you can take a look

tomasonjo avatar May 17 '23 21:05 tomasonjo

This was added, so you could probably close this issue:

https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/examples/graph_cypher_qa.ipynb

tomasonjo avatar May 25 '23 16:05 tomasonjo

I've started the PR, you can take a look

@tomasonjo Quick question: Does GraphCypherQAChain works well for you? If yes, what version?

I tried the example in the docs with the current latest version (0.0.197) but it throws.

v-almonacid avatar Jun 12 '23 00:06 v-almonacid

Whats the error you are getting?

tomasonjo avatar Jun 12 '23 05:06 tomasonjo

Whats the error you are getting?

I get the issue now. The LLM simply doesn't respond with a plain Cypher statement, so naturally Neo4jGraph.query() fails. Maybe it's because I'm using an Azure LLM instance and it doesn't behave the same (?)

v-almonacid avatar Jun 12 '23 14:06 v-almonacid

I dont have access to azure llms, so I can't test it. You can ask the llm to wrap the statement in three backticks as the code can extract the statement then

tomasonjo avatar Jun 12 '23 14:06 tomasonjo