graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Multi-category indexing and graph generation for diverse data types

Open aimanyounises1 opened this issue 1 year ago • 1 comments

Do you need to file an issue?

  • [X] I have searched the existing issues and this feature is not already filed.
  • [ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • [X] I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.

Is your feature request related to a problem? Please describe.

We are trying to use GraphRAG to index and generate knowledge graphs for a diverse set of data types . Specifically, we need to handle business explanations, UI code, and backend code as separate but interconnected categories. Currently, it's not clear how to efficiently structure and index these distinct data types while maintaining their relationships.

For example, based on the query we can decide which graph is relevant to retrieve the context from, this will enhance the accuracy by routing to the relevant output directory, that will be categorized as below in the yaml file.

Describe the solution you'd like

1.	Accept multiple data source directories, each corresponding to a different category (e.g., business_explanations, ui_code, be_code).
2.	Allow custom configuration for each category, specifying different parsing and indexing strategies based on the data type.
3.	Generate separate but interconnected knowledge graphs for each category.
4.	Provide a unified querying mechanism that can search across all generated graphs while maintaining context awareness of the different categories.
5.	Enable the definition of relationships between entities across different categories (e.g., linking a UI component to a business concept it implements).

Additional context

data_sources:
  - name: business_explanations
    type: text
    path: input/business_explanations
    file_types: [.txt, .md]
  - name: ui_code
    type: code
    path: input/ui_code
    file_types: [.js, .jsx, .ts, .tsx]
  - name: be_code
    type: code
    path: input/be_code
    file_types: [.java, .py, .cs]

graph_structure:
  - name: business_concepts
    source: business_explanations
    node_types:
      - name: Concept
        properties: [name, description]
  - name: ui_components
    source: ui_code
    node_types:
      - name: Component
        properties: [name, file_path]
  - name: backend_services
    source: be_code
    node_types:
      - name: Service
        properties: [name, file_path]

relationships:
  - name: implements
    source: ui_components
    target: business_concepts
  - name: serves
    source: backend_services
    target: business_concepts

aimanyounises1 avatar Aug 09 '24 12:08 aimanyounises1

hi, did you find a solution?

lawyinking avatar Oct 24 '24 16:10 lawyinking