Marco Fossati comments

Results 34 comments of


                                            Marco Fossati

Investigate how to intercept constraint check reports

> it expects a claim GUID, though. We can retrieve it in the debug logs of our bot, only when a new claim is created (doesn't apply to references). We...

Investigate how to intercept constraint check reports

Waiting for https://phabricator.wikimedia.org/T289710

Introduce static type analysis

I introduced a shell script in 23afc3ef5399eae5a4c5f25081424731d04e931c that also runs `mypy`. We can tell travis to execute it and fail the build as appropriate. Note that `mypy` only checks functions...

Retrieving the identifiers by occupation/instance of should be done against Wikidata dump

Workflow example: MusicBrainz musicians. INPUT: NT (triples) dump, i.e., `wikidatawiki/entities/latest-truthy.nt.bz2`; - [ ] get all sub-classes of _musician_ `Q639669`; - [ ] for each `sub-class` + _musician_, filter all subjects...

Retrieving the identifiers by occupation/instance of should be done against Wikidata dump

> We are waiting for direct access to the Wikidata dumps in the VPS machine: > https://phabricator.wikimedia.org/T209818 Task resolved: `ls /public/dumps/public/wikidatawiki/entities`

Retrieving the identifiers by occupation/instance of should be done against Wikidata dump

Alternative SPARQL method discussed during WikiCite 2018: **unwind `subclass of` recursion**. See https://etherpad.wikimedia.org/p/WikiCite18Day3sparql

Retrieving the identifiers by occupation/instance of should be done against Wikidata dump

One-shot BASH done

Retrieving the identifiers by occupation/instance of should be done against Wikidata dump

We finally opted for paged SPARQL, leaving this open as an extra feature.

Investigate Wikidata graph embeddings

On the target catalog side, we could leverage simple relations between entities to build comparable graph embeddings, e.g., the `knownForTitles` and `primaryProfession` in IMDb.

Policy to handle duplicate Wikidata items

The validator could check inconsistencies like `QID 1` points to `MusicBrainz artist X`, but `MusicBrainz artist X` points to `QID 2`.