LargeEA
LargeEA copied to clipboard
Source code for LargeEA: Aligning Entities for Large-scale Knowledge Graphs, VLDB 2022
LargeEA
Source code for LargeEA: Aligning Entities for Large-scale Knowledge Graphs
If there is any problem with reproduction, please create an issue in The GitHub repo page
Requirements
pytorch>=1.7.0
tensorflow>=2.4.1 (required for RREA)
faiss
transformers
datasketch[redis]
...
A full list of required packages is located in src/requirements.txt
Datasets
The IDS benchmark is provided by OpenEA
Our newly proposed benchmark DBP1M is available at Google Drive
First download and unzip dataset files, place them to the project root folder:
unzip OpenEA_dataset_v1.1.zip
unzip mkdata.zip
The dataset (small for IDS15K, medium for IDS100K, large for DBP1M) and lang (fr or de) parameter controls which benchmark to use.
For example, in the src
folder, setting dataset to small and lang to fr will run on OpenEA EN_FR_15K_V1 dataset.
Run
Take DBP1M(EN-FR) as an example:
Make sure the folder for results is created:
cd src/
mkdir tmp4
Name Channel
First get the BERT embeddings of all entities
python main.py --phase 1 --dataset large --lang fr
Then calculate TopK sims based on BERT:
python main.py --phase 2 --dataset large --lang fr
Finally the string-based similarity(this requires a redis server listening localhost:6379):
python main.py --phase 3 --dataset large --lang fr
Structure Channel
The structure channel uses result of name channel to get name-based seeds. Make sure run name channel first.
To run RREA model:
python main.py --phase 0 --dataset large --lang fr --model rrea --epoch 100
Channel Fusion and Eval
python main.py --phase 4 --dataset large --lang fr
Citation
Reference to cite when you use LargeEA in a research paper:
@article{largeEA,
author = {Congcong Ge and
Xiaoze Liu and
Lu Chen and
Baihua Zheng and
Yunjun Gao},
title = {LargeEA: Aligning Entities for Large-scale Knowledge Graphs},
journal = {{PVLDB}},
volume = {15},
number = {2},
pages = {237--245},
year = {2022}
}
Acknowledgements
We use the codes of MRAEA, RREA, GCN-Align, DGMC, AttrGNN, OpenEA, EAKit, SimAlign.
We also provide the modified version of OpenEA in order to run experiments on RTX3090 GPU:OpenEA-TF2