graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Merge entities

Open majidsh97 opened this issue 8 months ago • 2 comments

Description

This pull request introduces an optional workflow called merge_entities, which can be run after the extract_graph workflow. It aims to merge duplicate or near-duplicate entities (e.g., car and cars, or PCA and principal component analysis) in the entity and relationship tables.

Motivation

Currently, Graphrag may extract entities that are semantically similar but not identical. These duplicates increase the number of sparse or fragmented nodes in the knowledge graph and may negatively affect community detection and other downstream tasks.

By merging these entities, the graph becomes more semantically compact and meaningful, with improved structure and potentially better community coherence.

I created a graph about the soldering process. In this graph, You can see that without merging entities "Increased board complexity" was a separate fragment, and no community report was created but after merging entities, it is connected to the main node "soldering" and a community is created.

image image image

Proposed Changes Add a new optional merge_entities workflow Add config for merge_entities workflow (i.e. enable: true/false, ....) Add workflow to default workflows Add merge_entities prompt Add a JSON log file of llm output to the output folder

Checklist

  • ✅ I have tested these changes locally.
  • ✅ I have reviewed the code changes.
  • ❌ I have updated the documentation (if necessary).
  • ❌ I have added appropriate unit tests (if applicable).

I really appreciate it if you provide me with some feedback and if you think this is a good feature I will work on document and unit tests.

Here are some examples of merged entities:

SOLDER Merged from: SOLDER, MOLTEN SOLDER, SOLDER JOINTS, SOLDER JOINT, SOLDERED JOINT

CLEANING Merged from: CLEANING, CLEANING PROCESSES, CLEANING PROCESS

WAVE SOLDERING Merged from: WAVE SOLDERING, CS (WAVE SOLDERING) PROCESS

MACHINE SOLDERING Merged from: MACHINE SOLDERING, SOLDERING MACHINE

majidsh97 avatar Apr 10 '25 20:04 majidsh97

@microsoft-github-policy-service agree

majidsh97 avatar Apr 10 '25 21:04 majidsh97