gerbil
gerbil copied to clipboard
owl:sameAs via HDT
After a talk to Wouter Beek (@wouterbeek) at ISWC (https://rdfhdt.github.io/ISWC2017/) , I came to the conclusion that we could use Wouter's owl:sameas list and query these links faster via HDT instead of using lucene
where is the owl:sameas list? (or do i have to create it myself?)
Wouter can provide it probably
@RicardoUsbeck I have a list of 558,943,116 owl:sameAs
triples / explicit identity pairs that were extracted from a LOD Laundromat crawl in 2015. (The number of unique terms involved in at least one explicit identity pair is 179,739,567.) I can send you the list in private ([email protected]). The list is not yet published online because I can not 100% guarantee its correctness yet. I intend on publishing this list in a proper way to the wider community before the end of this year.
Sounds very good! :+1:
We need to make sure that we don't get wrong sameAs
connections as we got them with the DBpedia <-> data.nytimes links that connected Japan to Armenia :smile: But I assume that all of us would be interested in figuring out how good these links are :wink:
@MichaelRoeder I will have to disappoint you then: the largest cluster of owl:sameAs
IRIs has size 177,794. This includes not only Japan and Armenia, but also all other countries in the world, Albert Einstein, and the empty string :-P
That is not really disappointing. I think we simply have to find a way to
- identify these faulty clusters,
- figure out which of the links in the cluster are wrong (i.e., which links connect two correct clusters creating one large faulty cluster) and remove these links to fix the clusters.
I know that this is not easy to do in an automatic way. The easiest way is to do step 1 and remove all wrong clusters. However, step 2 sounds interesting from a research point of view :wink:
@MichaelRoeder The owl:sameAs resources are here: https://sameas.cc/
Let me know in case you encounter issues (the resource is still quite new, so I'm expecting there are some). Also, we hope to update this resource once we have a new LOD Cloud crawl.
@wouterbeek Thanks a lot. Very interesting service :+1:
Hey starting to clean up old issues. And finally wanted to include this :) @wouterbeek However the sameas.cc site seems to be down. Has it moved or is the service closed?
Hi @TortugaAttack , the site is still there: https://www.sameas.cc It is maintained by @raadjoe and myself. Feel free to contact us if there are any issues.
PS: In the meantime we have extended our work on owl:sameAs
in MetaLink, published at ESWC 2020. Take a look at https://krr.triply.cc/krr/metalink, it may be useful for Gerbil as well.
Ah perfect, thanks a lot (I tried it without the www) I will read through that, thank you!