StarE
StarE copied to clipboard
EMNLP 2020: Message Passing for Hyper-Relational Knowledge Graphs
StarE
Message Passing for Hyper-Relational Knowledge Graph.
Overview of StarE
StarE encodes hyper-relational fact by first passing Qualifier pairs through a composition function and then summed and transformed by
.
The resulting vector is then merged via
, and
with the relation and object vector, respectively. Finally, node Q937 aggregates messages from this and other hyper-relational edges. Please refer to the paper for details.
Requirements
- Python>=3.9
- PyTorch 2.1.1
- torch-geometric 2.4.0
- torch-scatter 2.1.2
- tqdm
- wandb
Create a new conda environment and execute setup.sh
.
Alternatively
pip install -r requirements.txt
WD50K Dataset
The dataset can be found in data/clean/wd50k
.
Its derivatives can be found there as well:
-
wd50k_33
- approx 33% of statements have qualifiers -
wd50k_66
- approx 66% of statements have qualifiers -
wd50k_100
- 100% of statements have qualifiers
More information available in dataset README
Running Experiments
Available models
Specified as MODEL_NAME
in the running script
-
stare_transformer
- main model StarE (H) + Transformer (H) [default] -
stare_stats_baseline
- baseline model Transformer (H) -
stare_trans_baseline
- baseline model Transformer (T)
Datasets
Specified as DATASET
in the running script
-
jf17k
-
wikipeople
-
wd50k
[default] -
wd50k_33
-
wd50k_66
-
wd50k_100
Starting training and evaluation
It is advised to run experiments on a GPU otherwise training might take long.
Use DEVICE cuda
to turn on GPU support, default is cpu
.
Don't forget to specify CUDA_VISIBLE_DEVICES
before python
if you use cuda
Currently tested on cuda==12.1
Three parameters control triple/hyper-relational nature and max fact length:
-
STATEMENT_LEN
:-1
for hyper-relational [default],3
for triples -
MAX_QPAIRS
: max fact length (3+2*quals), e.g.,15
denotes a fact with 5 qualifiers3+2*5=15
.15
is default forwd50k
datasets andjf17k
, set7
for wikipeople, set3
for triples (in combination withSTATEMENT_LEN 3
) -
SAMPLER_W_QUALIFIERS
:True
for hyper-relational models [default],False
for triple-based models only
The following scripts will train StarE (H) + Transformer (H) for 400 epochs and evaluate on the test set:
- StarE (H) + Transformer (H)
python run.py DATASET wd50k
- StarE (H) + Transformer (H) with a GPU.
CUDA_VISIBLE_DEVICES=0 python run.py DEVICE cuda DATASET wd50k
- You can adjust the dataset with a higher ratio of quals by changing
DATASET
with the available above names
python run.py DATASET wd50k_33
- On JF17K
python run.py DATASET jf17k CLEANED_DATASET False
- On WikiPeople
python run.py DATASET wikipeople CLEANED_DATASET False MAX_QPAIRS 7 EPOCHS 500
Triple-based models can be started with this basic set of params:
python run.py DATASET wd50k STATEMENT_LEN 3 MAX_QPAIRS 3 SAMPLER_W_QUALIFIERS False
More hyperparams are available in the CONFIG
dictionary in the run.py
.
If you want to adjust StarE encoder params prepend GCN_
to the params in the STAREARGS
dict, e.g.,
python run.py DATASET wd50k GCN_GCN_DIM 80 GCN_QUAL_AGGREGATE concat
will construct StarE with hidden dim of 80 and concat as gamma
function from the paper.
Integration with Weights & Biases (WANDB)
It's there out of the box! Create an account on WANDB Then, make sure you install the latest version of the package
pip install wandb
Locate your API_KEY in the user settings and activate it:
wandb login <api_key>
Then just use the CLI argument WANDB True
, it will:
- Create a
wikidata-embeddings
project in your active team - Create a run with a random name and log results there
When using this codebase or dataset please cite:
@inproceedings{StarE,
title={Message Passing for Hyper-Relational Knowledge Graphs},
author={Galkin, Mikhail and Trivedi, Priyansh and Maheshwari, Gaurav and Usbeck, Ricardo and Lehmann, Jens},
booktitle={EMNLP},
year={2020}
}
For any further questions, please contact: [email protected]