Add consistency check and inference with skosify
I am about to refactor Skosify to support use as module. We could integrate some of its functionality into mc2skos for instance to make sure that links have counterparts (related in both directions, broader/narrower...)
Cool, I'm already using it as a package in one of my scripts: https://github.com/scriptotek/data_ub_tasks/blob/master/data_ub_tasks/data_ub_tasks.py#L211-L215
Of course one could just pipe data from mc2skos to skosify, but it takes quite a bit of time to serialize and deserialize large RDF files, so I'm open to adding e.g. a skosify consistency check within mc2skos.
Skosify has many options that are don't needed so I would not support all of them. The following seem most useful in my opinion (see also https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf):
-
expand topConceptOf <-> hasTopConcept
-
expand narrower <-> broader
-
expand related <-> related
-
check related and narrower/broader exclude each other (also transtitively: an ancestor should not be related also!)
-
check the same label is not used as prefLabel and altLabel for the same concept with the same language
-
check every concept must have exactly one prefLabel per language of the concept scheme
-
detect cycles
I'd like to enable these most common checks with one or two options (e.g --expand and --quality) instead of having to create a config file so the choice must be opinionated. An additional option (--skosify configfile) could allow for all of Skosify features.
👍 for the two options!
When it comes to supporting a config file, it would be good if the same format could also be used by skosify directly.
First part implemented in #45. Packing Python and dependencies drives me nuts but I managed to do it.
P.S: Also added option --skosify.
I'm not going to implement the --quality option soon because it requires https://github.com/NatLibFi/Skosify/issues/52 and can also be done with option --skosify to some degree. You can close this issue after merge.