dsp-api
dsp-api copied to clipboard
Consistency checking inside 'upgrade' tool
Does the upgrade tool check the data for consistency before running the upgrade?
With the change to a scala based tool, now it is possible to provide the upgrade tool with a dump of the triplestore. This dump could be inconsistent if prior to the upgrade some manual changes where performed.
See the answer here:
https://discuss.dasch.swiss/t/knora-releases-and-system-upgrades/87/11?u=benjamingeer
One of the reasons for having declarative integrity constraints in triplestores/databases is so that you don't have to write a separate consistency checker.
... But there are things that Knora checks (e.g. the contents of the objects of datatype properties) that can't be checked by the consistency checks in the triplestore. It would be possible to write a separate tool to check these things, but I think it would be a big task.
I was thinking more along the line using an embedded GraphDB with the existing KnoraRules loaded. Then it would be a matter of running:
INSERT DATA {
_:b sys:consistencyCheckAgainstRuleset "KnoraRules"
}
But I guess, this won't be very fast and would make the upgrade slow again.
But, if the upgrade tool would do the consistency checking, then it could deactivate KnoraRules before uploading to GraphDB and then do a very fast upload.
Now, I'm running into the problem that the upgrade tool downloaded the data, ran the upgrade, deleted all data from the repository and then failed to upload because of an inconsistency error.
I don't think that a failed upgrade should leave an empty repository. Since the data loaded fine into GraphDB before the upgrade (with KnoraRules turned on) and the inconsistent property was not there before either, I guess that the inconsistency was introduced by the upgrade script.
I've created a separate issue for this bug: https://github.com/dasch-swiss/knora-api/issues/1512
In that case, there's a bug in the upgrade script. But there could always be bugs in the upgrade script, even if the upgrade script checks the consistency of the data. As I said here, that's why we need the declarative integrity constraints in KnoraRules: to protect us from bugs in our own software.
Basically, there is a contradiction between these two statements:
- I don't trust the upgrade tool, because it clearly has a bug that's corrupting the data.
- I trust the upgrade tool to check the consistency of the data, so I'm going to turn off the database's consistency checks.