extraction-framework icon indicating copy to clipboard operation
extraction-framework copied to clipboard

Sisterproject extractor

Open datalogism opened this issue 3 years ago • 8 comments

Following the @jlareck template https://github.com/dbpedia/extraction-framework/issues/719 i finished to develop a multilingual (that need to be configured for new lang) sister project extractor.

Summary by CodeRabbit

  • New Features
    • Introduced a new “Sister Projects Links” dataset, published as Linked Data and available alongside existing page link datasets.
    • Automatically extracts sister project links from infoboxes (with language-aware filtering and normalization) and publishes owl:sameAs connections to sister project resources.
    • Expands cross-project linkage coverage for each language-locale DBpedia instance, improving navigability and data integration across related Wikimedia projects.

datalogism avatar Nov 23 '22 10:11 datalogism

A question for my reviewers : This extractor creates triples with owl:sameAs relation. I made maybe a mistake by using this relation because a wiktionary page is not really equivalent to a wikipedia article... What do you think about it ? Must I create a new relation in the DBpedia ontology for each sister project ?

datalogism avatar Nov 23 '22 11:11 datalogism

owl:sameAs is definitely the wrong predicate, as it describes co-reference — i.e., owl:sameAs says that the subject and object URIs identify the same entity — far beyond the relation you appear to have been trying to describe.

That said, I'm not sure what the relation you were trying to describe actually is. Perhaps you can describe it in English? That will help your readers guide you to an existing predicate that describes that relation, or if necessary, suggest how to handle the lack of such.

TallTed avatar Nov 23 '22 16:11 TallTed

Thank you @TallTed for your feedback, you totally right a sameAs relationship is not the best way for describing it. The aim of this extractor is to retrieve the Sister projects links related to a given Wikipedia article, for example the Commons, the Wiktionary, Wikiquotes... A good practice could be to take example on :

  • http://mappings.dbpedia.org/index.php/OntologyProperty:WikiPageInterLanguageLink
  • http://mappings.dbpedia.org/index.php/OntologyProperty:WikiPageWikiLink And create a property for each possible Sistser project. What do you think about this solution ?

datalogism avatar Nov 25 '22 07:11 datalogism

An other solution could be also to use : skos:related @jlareck @Vehnem @kurzum, what is your opinion about that ?

datalogism avatar Dec 09 '22 10:12 datalogism

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 6 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Dec 20 '22 15:12 sonarqubecloud[bot]