Marco Fossati
Marco Fossati
> it expects a claim GUID, though. We can retrieve it in the debug logs of our bot, only when a new claim is created (doesn't apply to references). We...
Waiting for https://phabricator.wikimedia.org/T289710
I introduced a shell script in 23afc3ef5399eae5a4c5f25081424731d04e931c that also runs `mypy`. We can tell travis to execute it and fail the build as appropriate. Note that `mypy` only checks functions...
Workflow example: MusicBrainz musicians. INPUT: NT (triples) dump, i.e., `wikidatawiki/entities/latest-truthy.nt.bz2`; - [ ] get all sub-classes of _musician_ `Q639669`; - [ ] for each `sub-class` + _musician_, filter all subjects...
> We are waiting for direct access to the Wikidata dumps in the VPS machine: > https://phabricator.wikimedia.org/T209818 Task resolved: `ls /public/dumps/public/wikidatawiki/entities`
Alternative SPARQL method discussed during WikiCite 2018: **unwind `subclass of` recursion**. See https://etherpad.wikimedia.org/p/WikiCite18Day3sparql
One-shot BASH done
We finally opted for paged SPARQL, leaving this open as an extra feature.
On the target catalog side, we could leverage simple relations between entities to build comparable graph embeddings, e.g., the `knownForTitles` and `primaryProfession` in IMDb.
The validator could check inconsistencies like `QID 1` points to `MusicBrainz artist X`, but `MusicBrainz artist X` points to `QID 2`.