music-encoding icon indicating copy to clipboard operation
music-encoding copied to clipboard

ODD validation

Open kepper opened this issue 4 years ago • 14 comments

work in progress: this still needs to consider integration into GH Actions

fixes #789 (as side effect) fixes #790

kepper avatar May 28 '21 10:05 kepper

If it's helpful for reference, we did have schematron validation available previously. There were a number of XSLTs that would extract the rules from the compiled RNG and then run the validation. It was really clunky, but it seemed to work.

Here's a link to the shell script setup:

https://github.com/music-encoding/music-encoding/blob/bdfb695be193ca8619949817db18b6a0a2c0cf3b/build.sh#L37-L97

The XSLTs were copied from teh intarnet:

https://github.com/music-encoding/music-encoding/tree/bdfb695be193ca8619949817db18b6a0a2c0cf3b/utils/schematron

ahankinson avatar May 28 '21 11:05 ahankinson

It would be easy to run the shell script from GH actions. The question is if it needs to be updated for any reason (e.g. including the changes made by @kepper )

musicEnfanthen avatar May 28 '21 12:05 musicEnfanthen

at ODD Friday, we agreed that we pull in the schema right now (after the meeting), and will try to set up a GH Action independently of that. The GH Action could fail when it's not valid.

kepper avatar May 28 '21 12:05 kepper

Ok, I will setup a test repo to see if we can get the build.sh script by @ahankinson running there from an action.

@ahankinson Could you please indicate which input files (rngfile, meifile) would be best for the test?

musicEnfanthen avatar May 31 '21 13:05 musicEnfanthen

Maybe I misunderstood the question: The MEI schematron rules are extracted from the compiled RNG schema. So the process is:

  • Take the ODD, make an RNG
  • Take the RNG, extract the schematron
  • Do a bunch of hocus pocusy XML shenanigans to run the schematron rules through a bunch of stylesheets and satellites and underwater cables and who knows what else to eventually arrive at an XSLT that can generate 👍 or 👎 based on the rules
  • Apply that XSLT to an input MEI file, which will validate the MEI as following the schematron rules.

But maybe you're doing something different here? Are you doing the same thing, but for the TEI rules for validating the schematron rules in the MEI ODD?

ahankinson avatar May 31 '21 13:05 ahankinson

Hi @ahankinson, if I got you right, the job here is actually slightly different. This is not to extract the Schematron rules that are part of MEI, but to control that the ODD used to write MEI itself is conforming to some expectations. This is not a Schematron a user of MEI would ever see or have to use. So the idea would be to have a flag that says whether a change to the sources conforms to our expectations, or if it introduces problems. For instance, this checks whether the guidelines have a link to a non-existing element (because it's mispelled or some such). Does this help?

kepper avatar May 31 '21 13:05 kepper

Yeah, that's what I thought. I expect, though, that you would need to run through the same process with the Schematron rules you're writing, though -- run the Schematron rules through an XSLT to generate an XSLT that is then used to verify the ODD that contains the schematron rules that you want to verify by extracting them into an XSLT to generate an XSLT that is then used to verify the MEI XML that someone would write.

Confusing? :-D

ahankinson avatar May 31 '21 14:05 ahankinson

what about using TEI schemata for a first implementation e.g.:

  • https://tei-c.org/Vault/P5/4.2.2/xml/tei/custom/schema/relaxng/tei_odds.rng
  • https://tei-c.org/Vault/P5/4.2.2/xml/tei/custom/schema/relaxng/tei_customization.rng

bwbohl avatar May 31 '21 14:05 bwbohl

Wouldn't that be used to validate the schematron rules in a TEI file? Not validate the rules in the ODD file? I'm seeing rules like this in the tei_odds.rng:

   <pattern xmlns="http://purl.oclc.org/dsdl/schematron"
            id="tei_odds-att.datable.w3c-att-datable-w3c-from-constraint-rule-2">
      <sch:rule xmlns:sch="http://purl.oclc.org/dsdl/schematron"
                xmlns:rng="http://relaxng.org/ns/structure/1.0"
                xmlns:xi="http://www.w3.org/2001/XInclude"
                xmlns="http://www.tei-c.org/ns/1.0"
                context="tei:*[@from]">
        <sch:report test="@notBefore" role="nonfatal">The @from and @notBefore attributes cannot be used together.</sch:report>
      </sch:rule>
   </pattern>

That looks like it's used to validate a TEI file.

ahankinson avatar Jun 01 '21 08:06 ahankinson

2021-06-24 ODD Thursday:

The PR and its discussion are confusing to some extent, especially as there is no summary description of the changes. If we understand correctly, the general idea is having validation and test run on schemata changed by PRs. This is generally a very good idea in order to avoid simple errors like the one resulting in #789! The roadmap decided in #790 is:

  1. introduce schematron-based validation for mei source files (#790)
  2. introduce validation of the ODD files against the TEI schemata. The question arising here is which version of TEI are we talking about?
  3. re-enable running schema-tests against sample-files
  4. implement all in a GitHub Action that builds and validates and tests. This requires building every pull request with the build-part of the deploy.yml action; the build artefacts can be attached to the action run, the action should post validation results to the PR as comment. All this maybe should be designed as separate jobs, build nevertheless is a prerequisite for all jobs.

N.B. As there is a general requirement of running all this also locally we probably should move to a docker-based build scenarios.

bwbohl avatar Jun 24 '21 12:06 bwbohl

concerning 2. of my above post:

  1. introduce validation of the ODD files against the TEI schemata. The question arising here is which version of TEI are we talking about?

The music-encoding repo hold a own version of the TEI schemata namely source/validation/mei_odds.rng this dates back to TEI v3.2.0 from 2017. While we're at it, shouldn't we upgrade the MEI source files to the latest version of TEI, being v4.2.2?

bwbohl avatar Jun 28 '21 09:06 bwbohl

Yes, at some point, we should. That seems to be a rather significant change, though, that requires proper testing. So I'm inclined to schedule that for a later, dedicated meeting, like a dev workshop ;-)

kepper avatar Jun 28 '21 10:06 kepper

ok, should try pure ODD at the same time then.

bwbohl avatar Jun 29 '21 07:06 bwbohl

@kepper How is the status of this?

rettinghaus avatar Jan 30 '22 16:01 rettinghaus

running this locally results in 57 consistency problems in the specs and guidelines. I would like to have this merged before starting to fix those errors.

kepper avatar Oct 02 '22 22:10 kepper

Thanks for this. Now a question to our experts @bwbohl, @musicEnfanthen, and maybe @riedde: What would be necessary to have this as an automatic check when committing, and then automatically adding a flag to the repo?

kepper avatar Oct 03 '22 12:10 kepper

The validation of the odd against a rng schema can, e.g., be done by a jing task. This task could also be integrated into the ant build file, if needed. Concerning flags: I'm not into this topic.

riedde avatar Oct 05 '22 10:10 riedde

Ideally, the task would check on any incoming PRs, and then flag the PR if it fails.

ahankinson avatar Oct 05 '22 10:10 ahankinson