Add XML schema validation for metadata, rulesets and mapping files
This pull request adds schema based validation for XML content as described in #6367.
Since Kitodo.Production makes heavy use of XML files in all parts of the application and depends on those files to conform to specific patterns in order for the programm to run correctly, the aim of this pull request is to harden the application against errors caused by schema invalid XML files. It should also help users and developers finding and fixing XML content related errors.
The validation is called whenever a metadata, ruleset or mapping file is loaded:
- (internal) metadata files are validated against schemata
mets.xsdandkitodo.xsd - ruleset files are validated against the ruleset schema
ruleset.xsd - mapping files are validated against the XSLT schema definition file
xslt20.xsd
If a metadata file is not valid against the corresponding schema definition, loading the file in the metadata editor fails and a popup dialog is displayed to inform the user about the specific validation errors:
Additionally, saving a "Mapping file" or "Ruleset" with a schema invalid XML file will also be prevented:
The user - or rather the admin - is then responsible for fixing the XML structure of the corresponding file before processing can continue. The specific validation errors listed in the popup dialog can be used to fix the invalid files.
In case of metadata files, Kitodo.Production used to save internal meta.xml with a slightly invalid METS structure. For details see #6714. That issue was resolved by pull request #6715, which fixed the internal logic so that all future metadata files will have a valid METS structure.
However, since fixing the METS structure of potentially millions of existing processes manually is not feasible, this pull request also adds a new Kitodo-Script called resaveMetadataFile, which can be applied to multiple selected processes in the process list. It resaves the meta.xml file with the fixed program logic mentioned above, thus fixing the METS structure of existing legacy process and therefore allowing to open them in the metadata editor again.
Validation is also performed on XML data imported via search interfaces from external sources. Here, the schema definition used depends on the metadata format configured in the import configuration for the external source:
The following external metadata formats are validated:
-
MODSdata is validated against the current mods schema definitionmods-3-8.xsd -
MARCdata is validated againstMARC21slim.xsd -
PICAdata is validated againstpica-xml-v1-0.xsd -
EADdata is validated againstead.xsd
These schema definition files have been retrieved from the website of the Library Of Congress and added to the Kitodo.Production repository.
In contrast to the validation of internal files, errors during the validation of external XML content do not prevent the user from working with it, since normally a user does not have any way to fix the source of those XML documents retrieved from external interfaces.
In these cases, the popup dialog listing the validation errors offers an additional "Continue without validation" button:
XML Validation of data records retrieved from external sources can now also be deactived completely in the corresponding import configuration:
When a data record is retrieved via a certain kind of search interface (specifically SRU and OAI), the descriptive metadata (for example MODS) will be embedded in a container XML format, resulting in a combined XML structure incorporating elements from multiple schema definitions. Since this has to be taken into account when performing XML validation of those combined records, schema definitions for SRU (srw-types.xsd) and OAI (OAI-PMH.xsd) have been added as well. (see #5877 for details)
In addition to the metadata standards listed above, the XML structure of internal documents is also validated against the Kitodo schema definition kitodo.xsd (see section above about loading internal metadata files). Because an issue in the current Kitodo schema definition prevents the original kiodo.xsd file from being used for schema validation (see discussion #6682 for details), a minimally adjusted copy of this schema definition has been added to the repository for schema validation.
Note: Please make sure that your import XSLT mapping files create only "Kitodo" elements, not - potentially invalid or incomplete - METS structures (which are normally created by the application, unless you are using "Prestructured import", in which case the import mapping files are also responsible for creating the METS container of importerd records). This is because the validation of imported data will detect and mark incomplete or spurious METS elements around the real "kitodo:kitodo" metadata elements possibly created by existing import mapping files.
The functionality for the validation has been implemented in the existing module Kitodo-Validation. This module already contained an empty class FileStructureValidation.java which now contains the basic methods for the XML validation.
Fixes #6367 Fixes #5877