ensmallen
ensmallen copied to clipboard
Graphs to be added as automatic retrieval
We would like to add some more graphs to the automatic retrieval mechanism.
Currently, we support only StringPPI (human version), CompleteStringPPI (cross-species) and KG-COVID-19.
Which graphs should we add to the list? The requirements for the graph are:
- Must be publicly available behind an URL that can be resolved with a
wget
. - Must be a TSV/CSV/text file with separators.
- The server where it is hosted must be reasonably fast.
- Can be a zip, gzip, tar.gz or plain file.
Karate
Finish adding graphs from Network Repository,
Most of these are now available, we still need to add support for timestamp graphs and graphs with multi-labelled nodes.
Added graphs from kghub with this pull request.
@LucaCappelletti94 - what about including any of the resources in this (also copied below) table? If you want, we could select a few to start with. I'd be happy to write some simple code that converts them from PyKeen
format into the spec you provide above. Let me know what you think!
I put a ➡️ next to the ones I think are worth starting with and a ⭐ next others worth considering for future incorporation.
Name | Reference | Description |
---|---|---|
⭐ckg | pykeen.datasets.CKG |
The Clinical Knowledge Graph (CKG) dataset from [santos2020]_. |
➡️ codexlarge | pykeen.datasets.CoDExLarge |
The CoDEx large dataset. |
codexmedium | pykeen.datasets.CoDExMedium |
The CoDEx medium dataset. |
codexsmall | pykeen.datasets.CoDExSmall |
The CoDEx small dataset. |
➡️ conceptnet | pykeen.datasets.ConceptNet |
The ConceptNet dataset from [speer2017]_. |
⭐cskg | pykeen.datasets.CSKG |
The CSKG dataset. |
⭐drkg | pykeen.datasets.DRKG |
The DRKG dataset. |
fb15k | pykeen.datasets.FB15k |
The FB15k dataset. |
fb15k237 | pykeen.datasets.FB15k237 |
The FB15k-237 dataset. |
⭐ hetionet | pykeen.datasets.Hetionet |
The Hetionet dataset is a large biological network. |
➡️ kinships | pykeen.datasets.Kinships |
The Kinships dataset. |
nations | pykeen.datasets.Nations |
The Nations dataset. |
ogbbiokg | pykeen.datasets.OGBBioKG |
The OGB BioKG dataset. |
ogbwikikg | pykeen.datasets.OGBWikiKG |
The OGB WikiKG dataset. |
➡️ openbiolink | pykeen.datasets.OpenBioLink |
The OpenBioLink dataset. |
openbiolinkf1 | pykeen.datasets.OpenBioLinkF1 |
The PyKEEN First Filtered OpenBioLink 2020 Dataset. |
openbiolinkf2 | pykeen.datasets.OpenBioLinkF2 |
The PyKEEN Second Filtered OpenBioLink 2020 Dataset. |
openbiolinklq | pykeen.datasets.OpenBioLinkLQ |
The low-quality variant of the OpenBioLink dataset. |
umls | pykeen.datasets.UMLS |
The UMLS dataset. |
wn18 | pykeen.datasets.WN18 |
The WN18 dataset. |
wn18rr | pykeen.datasets.WN18RR |
The WN18-RR dataset. |
yago310 | pykeen.datasets.YAGO310 |
The YAGO3-10 dataset is a subset of YAGO3 that only contains entities with at least 10 relations. |
Also, see the datasets listed in this KG Embedding Review on the bottom of page 22. These are datasets that are most frequently used by people developing new KG embedding methods:
Thank you, @callahantiff! We still need to add support for multi-class support for the nodes (that is, nodes with multiple classes such as a node that is both of class mammal and class cat). Even though we plan to add support for these and other node and edge features, we will surely work on them after finishing Grape. Do you know if these graphs have multiple classes per nodes or just nodes of multiple classes, with each node of a single class? [UPDATE 2021/04/19] We have support for multi-labeled nodes in graphs now!
If it's the second option, then we can surely support now all the considered graphs. How hard is it to convert them into a CSV-like format? And more importantly, where could we host these? Maybe on kg-hub? Would that be an option @justaddcoffee?
My general feeling is that we can and should allow easy ingest of remote graphs as we are discussing here.
But, I think we should avoid hosting other people's graphs on KG-hub unless they are transformed versions that we are incorporating into our own knowledge graphs (like our ChEMBL transform that we include in KG-COVID-19).
Glad to discuss though
Import graphs, after adding support for time intervals, from http://www.sociopatterns.org/
Hey @LucaCappelletti94 , as discussed earlier here's the link for kg-microbe graphs: https://kg-hub.berkeleybop.io/kg-microbe/20210422/kg-microbe.tar.gz
Thank you @hrshdhgd!
Hi @hrshdhgd, sorry for the long wait, now all versions of KG-Microbe and KG-COVID are integrated in the automatic retrieval.
No problem @LucaCappelletti94 , thank you very much!
I am now iterating once more on the graphs from the automatic graph retrieval (we are now at over 80K graphs downloadable). Do you have more suggestions?