Emgraph
Emgraph copied to clipboard
A Python library for knowledge graph representation learning (graph embedding).
Emgraph
Emgraph (Embedding graphs) is a Python library for graph representation learning.
It provides a simple API for design, train, and evaluate graph embedding models. You can use the base models to easily develop your own model.
Installation
Install the latest version of Emgraph:
$ pip install emgraph
Quick start
Embedding wordnet11 graph using
TransE
model:
from sklearn.metrics import brier_score_loss, log_loss
from scipy.special import expit
from emgraph.datasets import BaseDataset, DatasetType
from emgraph.models import TransE
def train_transe(data):
model = TransE(batches_count=64, seed=0, epochs=20, k=100, eta=20,
optimizer='adam', optimizer_params={'lr': 0.0001},
loss='pairwise', verbose=True, large_graphs=False)
model.fit(data['train'])
scores = model.predict(data['test'])
return scores
if __name__ == '__main__':
wn11_dataset = BaseDataset.load_dataset(DatasetType.WN11)
scores = train_transe(data=wn11_dataset)
print("Scores: ", scores)
print("Brier score loss:", brier_score_loss(wn11_dataset['test_labels'], expit(scores)))
Evaluating ComplEx
model after training:
import numpy as np
from emgraph.datasets import BaseDataset, DatasetType
from emgraph.models import ComplEx
from emgraph.evaluation import evaluate_performance
def complex_performance(data):
model = ComplEx(batches_count=10, seed=0, epochs=20, k=150, eta=1,
loss='nll', optimizer='adam')
model.fit(np.concatenate((data['train'], data['valid'])))
filter_triples = np.concatenate((data['train'], data['valid'], data['test']))
ranks = evaluate_performance(data['test'][:5], model=model,
filter_triples=filter_triples,
corrupt_side='s+o',
use_default_protocol=False)
return ranks
if __name__ == '__main__':
wn18_dataset = BaseDataset.load_dataset(DatasetType.WN18)
ranks = complex_performance(data=wn18_dataset)
print("ranks {}".format(ranks))
More examples
Embedding wordnet11 graph using
DistMult
model:
from sklearn.metrics import brier_score_loss, log_loss
from scipy.special import expit
from emgraph.datasets import BaseDataset, DatasetType
from emgraph.models import DistMult
def train_dist_mult(data):
model = DistMult(batches_count=1, seed=555, epochs=20, k=10, loss='pairwise',
loss_params={'margin': 5})
model.fit(data['train'])
scores = model.predict(data['test'])
return scores
if __name__ == '__main__':
wn11_dataset = BaseDataset.load_dataset(DatasetType.WN11)
scores = train_dist_mult(data=wn11_dataset)
print("Scores: ", scores)
print("Brier score loss:", brier_score_loss(wn11_dataset['test_labels'], expit(scores)))
Model | Reference | |
1 | TransE |
Translating Embeddings for Modeling Multi-relational Data |
2 | ComplEx |
Complex Embeddings for Simple Link Prediction |
3 | HolE |
Holographic Embeddings of Knowledge Graphs |
4 | DistMult |
Embedding Entities and Relations for Learning and Inference in Knowledge Bases |
5 | ConvE |
Convolutional 2D Knowledge Graph Embeddings |
6 | ConvKB |
A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network |
Call for Contributions
The Emgraph project welcomes your expertise and enthusiasm!
Ways to contribute to Emgraph:
- Writing code
- Review pull requests
- Develop tutorials, presentations, and other educational materials
- Translate documentation and readme contents
Issues
If you happened to encounter any issue in the codes, please report it here. A better way is to fork the repository on Github and/or create a pull request.
Features
- [x] Support CPU/GPU
- [x] Vectorized operations
- [x] Preprocessors
- [x] Dataset loader
- [x] Standard API
- [x] Documentation
- [x] Test driven development
If you find this project helpful, please consider giving it a :star:.
License
Released under the BSD license
Credit
This repository is a transformation of the AmpliGraph library for TensorFlow 2, with a modular architecture implementation. It also draws inspiration from PyKEEN and Spectral. Credit is extended to these exceptional projects.