private-gpt PoC: Added initial Knowledge Graph support

Knowledge Graph

This PR introduces knowledge graph capabilities.

What is a knowledge graph?

A knowledge graph is a collection of nodes and edges that represent entities or concepts, and their relationships, such as facts, properties, or categories. It can be used to query or infer factual information about different entities or concepts, based on their node and edge attributes.

Changes Made

Knowledge Graph Support:
- Added support for integrating a knowledge graph into the project. This feature allows for the combination of the knowledge graph with the vector store to leverage different contextual sources.
Neo4j Graph Store Provider:
- Integrated a Neo4j Graph Store provider. A graph database like Neo4j is instrumental in managing complex relationships between data entities. By representing data as nodes and relationships, it enables efficient querying and traversal of interconnected data, making it an ideal choice for implementing a knowledge graph. Additionally, it offers powerful querying capabilities such as pattern matching, making it easier to extract insights from interconnected data.
- During development, encountered issues related to lowercase and string formatting, which have been addressed in this PR.
RDF File Support (Turtle Syntax):
- Implemented support for ingesting RDF files in Turtle syntax into the graph. RDF files represent data in a graph-like structure using subject-predicate-object triples. This allows us to incorporate structured data into the knowledge graph, facilitating richer data representation and enabling advanced querying and analysis.
- The main reason for implementing RDF in the project is to allow processing any kind of linked data on the web locally, following the principles of the project.
- To generate a Wikidata RDF file, a sample Jupyter notebook has been provided: here.
Router Retriever Support (Ensemble retrievers):
- Added support for router retrievers, allowing the simultaneous use of multiple sources with a score ranking mechanism. This enhancement enhances the project's ability to retrieve information from diverse sources and prioritize the most relevant results.
- This feature has been limited to use just one source in this version, it would be nice to parametrize this information in configuration or define a better selection strategy :).

TODO

[ ] Ingesting files to Knowledge Graph using ParallelizedIngestComponent, BatchIngestComponent, PipelineIngestComponent
[ ] Refactor code to support VectorIndex and KnowledgeGraphIndex
[ ] More Graph providers like Nebula.
[ ] Allow specific extensions when a provider is enabled e.g. RDF can be used when any GraphStore provider is enabled.
[ ] Refactor methods to better identification between vector and graph components.

How to activate it?

In order to select one or the other, set the graphstore.database property in the settings.yaml file to neo4j. It will be need to install extra graph-stores-neo4j.

graphstore:
  database: neo4j

To configure Neo4J connection, set the neo4j object in the settings.yaml.

neo4j:
  url: neo4j://localhost:7687
  username: neo4j
  password: password
  database: neo4j

Run local Neo4J using Docker

To run Neo4j using Docker, you can use the following command:

docker run \
    --restart always \
    --publish=7474:7474 --publish=7687:7687 \
    --env NEO4J_AUTH=neo4j/password \
    -e NEO4J_apoc_export_file_enabled=true \
    -e NEO4J_apoc_import_file_enabled=true \
    -e NEO4J_apoc_import_file_use__neo4j__config=true \
    -e NEO4JLABS_PLUGINS='["apoc"]' \
    -v $PWD/data:/data -v $PWD/plugins:/plugins \
    neo4j:5.18.0

Mar 27 '24 20:03 jaluma

Is the Knowledge Graph functionality working? Has anyone tried it?

Jun 05 '24 22:06 spsach

Is the PR still alive? Are you going to make it more generic such that it will be able to support more Graph Databases?

Aug 27 '24 11:08 gkorland