xsd-compare icon indicating copy to clipboard operation
xsd-compare copied to clipboard

Enable recursive XSD adding text output compare (as JavaFX is not working)

Open svanteschubert opened this issue 2 years ago • 15 comments

Dear Yoep,

Thank you for your project it gave me a very good kick start on the Xerces Java API! (BTW I was able to build and debug thourgh Xerces in conjucntion with xsd-compare in IntelliJ using JDK 8 after adding some HTML interface).

The features of JavaFX, spreadsheet and native packaging have not been working for me from the start on my Ubuntu 22.04 LTS (Jammy Jellyfish) with OpenJDK 11 (nor on Windows 10). I tried to update to all latest dependency versions, but as JavaFX is released in parallel with JDK, perhaps an update of the project to the JDK 17 LTS - as it is a stand-alone comparison tool this should be unproblematic. But as I am fine with text output for the comparison I don't follow this any longer. Still this update caused some changes in the spreadsheet and JavaFX classes. Also I did some renaming, mainly according to XSD spec and Xerces, sometimes due to refactoring, e.g. Renamed Modifications.java to Change.java as it now represents either an Add, Modfication or Removal change. A modification change might stilll consist of the change of several XSD facets but it is seen as a single change of either an element or attribute. BTW it helped me to read the abstract XSD data model chapter in the W3C XSD spec (and jump back there from time to time)

With this PR I have enabled the feature of recursive XSDs like the on from UN/CEFACT XML:

  1. https://unece.org/trade/uncefact/mainstandards
  2. https://unece.org/trade/uncefact/xml-schemas

My test case was the compare of the cross industry invoice (CII) XML of the second release (B) of 2016 -> D16B (used by the EU norm for e-invoices EN16931 - where I am a CEN co-editor) with the second release (B) of 2022 -> D22B currently a Beta

I am returning the comparison as a list of changes, which an be serialized to text by using a different TextReporter.

I am using two different text outputs (both being serialized to txt files by JUnit tests to ./target)

  • The human easier readable multi-line represenation of a change by a multi-line reporter. see https://raw.githubusercontent.com/svanteschubert/xsd-compare/develop/src/test/resources/references/comparison-CII_D16B-with-D22B.txt
  • The single line represenation for easier comparison with other XSD tooling (a friend works on another) and we want to compare the output by a single-line reporter. see https://raw.githubusercontent.com/svanteschubert/xsd-compare/develop/src/test/resources/references/comparison-CII_D16B-with-D22B_singleLined.txt

You may find for both new added automated regression tests.

Note in the multi-line output, that the type change in the XSD is the first line (for modifications also the second indented lines), which I call Header Lines, as they exist once for multilple XPath. A single type change might effect many different named nodes accross valid documents of the XSD grammar. Therefore may one header line have many XPath (a HashSet of XPath as body lines). In the end I have the following statisik:

**** STATISTIC ****

** ELEMENTS

  • Added elements in XSD: 98

  • Added elements in XML: 1828

  • Modified elements in XSD: 18

  • Modified elements in XML: 175

  • Removed elements from XSD: 6

  • Removed elements from XML: 45

** ATTRIBUTES

  • Added attributes in XSD: 5

  • Added attributes in XML: 10

  • Modified attributes in XSD: 52

  • Modified attributes in XML: 2352

  • Removed attributes from XSD: 141

  • Removed attributes from XML: 8703

I might continue after eastern to check completness and might add tests for cases where xs:choice, xs:sequence, xs:all is being changed. But perhaps you are curious why the JavaFX is not starting any longer and might give it a quick check. Perhaps as I started to do updates switch to JDK 17. :-) BTW the maven-shade-plugin did no longer work (for me) so I switched to maven-assembly-plugin creating a jar-with-depencies. Still I got troubles moving version, buildtime into the manifest as I am doing at other projects: https://github.com/tdf/odftoolkit/blob/master/odfdom/pom.xml#L136 allowing that a "java -jar XY.jar" returning version information, I even use usually the last commit hash. ;-)

Anyway, this is therefore a draft PR as I am uncertain if I broke existing features (as they never worked for me from the beginning), nevertheless the recursive grammar and the refactoring are worth a review IMHO. If you are like to share a remote tea, I would happily explain in more detail what I intended and where I would like to have some feedback from you :-)

All the best and Happy Eastern holidays (even if we might have different religions there should be always a reason to celebrate) :-) Svante

svanteschubert avatar Apr 06 '23 11:04 svanteschubert

Thanks already for the draft, seems very promising. I'll see when I can find the time to test it myself and go again through the full code as it has been some time when I worked on this project.

yoep avatar Apr 07 '23 07:04 yoep

Added some missing facets to comparison including test documents to compare changes (1/2)

  • xs:maxInclusive
  • xs:maxExcluisve
  • xs:minInclusive
  • xs:minExclusive
  • xs:totalDigits
  • xs:fractionDigits

and improved the reporting and added real-world test documents - the factur-x subset XSDs. Test documents compared in one test with all reports.

My original use case was to check if the UN/CEFACT XSD D22B has added new constraints like:

  1. Removed elements/attributes that were originally used by EN16931 (esp. our Factur-X subset so called Core-Invoice-Usage-Specification (CIUS)).
  2. Added higher constraints: added mandatory node beyond existing node
  3. Limited value set, e.g. raising minimum value or lowering a maximum value or removing enumeration values All the above would break backwards compatibility. For this reason, I have added two explicit reports for these use cases (e.g. OnlyNewRestrictionReport and OnlyNewExtensionReport).
  • The OnlyNewRestrictionReport would show restricitions and new constraints, but also incompatilite changes (e.g. fixed default value changes).
  • The OnlyNewExtensionReport would show extensions. Any subset of the EU norm (EN16931) that is taking national law into account may narrow an optional to mandatory (which is useful for national VAT laws) but is not allow to extend EN16931 (unless it is explicitly an extension and not a CIUS).

What might be missing:

  1. Two facets xs:assert and xs:explicitTimezone might be added
  2. I have changed in my facet test document xs:choice to xs:sequence and this was not yet flagged

svanteschubert avatar Apr 16 '23 07:04 svanteschubert

Finally, I have done some further programming of improvements:

  1. added the detection of changes of the compsitor (xs:sequence, xs:choice, xs:all)
  2. improved the output of comparisons: Stating no semantic change if a new single enumeration is equal to the previous fixed value or a removed single enumeration is still provided as @fixed attribute. Finally, moving new enumerations from extensions to restrictions reports, e.g. as a given domain (e.g. Strings) is being restricted to the subset of the enumeration of values

I could not find xs:assert nor xs:explicitTimezone in my XSDs (UN/CEFACT XML and UBL) neither could I find "explicitTimezone" with a grep within Xerces sources, therefore I am fine not embracing them now.

Guess, I am ready now :-)

svanteschubert avatar Apr 17 '23 16:04 svanteschubert

I have examples for the missing features in a comment of a test file.

    <xs:complexType name="assertType">
        <xs:attribute name="min" type="xs:int"/>
        <xs:attribute name="max" type="xs:int"/>
        <xs:assert test="@min le @max"/>
    </xs:complexType>

    <xs:simpleType name="SpecificTimeType">
        <xs:restriction base="xs:time">
          <xs:explicitTimezone value="required" />
        </xs:restriction>
      </xs:simpleType>

svanteschubert avatar Apr 17 '23 16:04 svanteschubert

After being finished with comparing the UN/CEFACT XML, I added a test for comparing UBL 2.1 with UBL 2.3.

Unfortunately, this test fails due to acquisition of too much Heap space (tried increasing Heap in IntelliJ without success). It seems that for UBL the extra DOM-like element model that xsd-compare is creating is too much. I loved this abstraction for easier view on the model and was (a bit I am still) tempted to test the same Xerces-based comparison with the MultiSchemaValidator (MSV) under the hood, were I did recently some release, to check if the model is understood correctly. (MSV is also loading DTD and RelaxNG in an internal model that abstracts from XSD, DTD and RNG).

Nevertheless, the comparision for UN/CEFACT XML worked (and this was my main goal) and I have learned a lot of the underlying Xerces API and the problems that occur when comparing XSD grammars.

Thanks again for your nice project, Yoep!

Best regards, Svante

svanteschubert avatar Apr 18 '23 08:04 svanteschubert

PS: Realized that the change of element content was not compared (added it)

svanteschubert avatar Apr 19 '23 13:04 svanteschubert

Something that I think might be better is to move the CLI functionality to a CommandLineRunner. This allows us to switch between the GUI and CLI without having to do additional work such as setting up the logger and manually parsing the argument options given to the program. This way, we can guarantee the same functionality between the GUI and CLI as the same backend will be available.

If you want, I can do this after the PR merge and I'll foresee some new Github actions integrations to generate installers for all Operating Systems.

yoep avatar Apr 20 '23 08:04 yoep

Yes, please do so! I agree with you, currently I have extended XsdCompareStarter to offer command-line options. But you are right, currently the GUI is not the default, but of course this should be switched when GUI is running! https://github.com/svanteschubert/xsd-compare/blob/develop/src/main/java/com/compare/xsd/XsdCompareStarter.java#L19 Please refactor as you wish, I am not emotional on this - neither on naming ;-)

If I will find further problems in XSD compare in the past weeks (e.g. by the work on Factur-X e-invoice standard), I will continue to iterate/enhance the comparison!

Last but not least, I am usually using the Apache 2 License (as Xerces) as it allows to copy source code functionaltity in other Apache sources of "mine", e.g. https://github.com/tdf/odftoolkit/. It gives me the freedom also to take parts in commercial software, but these questions are sometimes seen as religious. Just wanted to state that I am offering the source code as well as Apache 2 to you if you like to switch the license, but no pressure!

svanteschubert avatar Apr 21 '23 07:04 svanteschubert

Thanks about the remark of the license (forgot about it), I've updated it to Apache 2.0.

I'll first make sure the packaging is working for all OS systems before I do some work on the refactoring. For me it's fine to merge the existing PR as-is, I'll put in some effort to get it back up-and-running again in JavaFX.

yoep avatar Apr 21 '23 15:04 yoep

I tried variations on log and packaging, but logging never worked for me in a file) and packaging updates can be ignored as well as I never was able to get rid of exceptions. Therefore you might ignore or questions such changes. Sorry for any confusion and thanks for the license update 🤗

svanteschubert avatar Apr 21 '23 15:04 svanteschubert

I have updated an updated of the UN/CEFACT XSDs and relalized a mistake I made with the report of new restrictions and extensions of the XSD grammar in regard of enumerations. While if you have an enumeration and add new values it is an extension of the grammar, it is in the very beginning (where you have earlier only a type like xs:string) and add a new enumeration not an extension but a restriction of all possible values (here strings of xs:string) to the new given set of the enumeration! I likely will fix this - as you already approved the PR may I still add this fix on top of this PR?

svanteschubert avatar Apr 27 '23 08:04 svanteschubert

BTW many thanks for the license change. With the commit the master branch has now appeared in public on GitHub and I might try to rebase this PR develop branch with te master branch (adding my mentioned reporter updates for enumeration fixes on top). But all commits would have new hashes and it would be a force push, making any work of yours on the PR "difficult" so I will hold my feet still until you give a green flag! :-)

svanteschubert avatar Apr 27 '23 08:04 svanteschubert

Hi yoep, I rebased my changes with your Apache licence change, which happened after my fork, so it is now shown on top of my changes. In addition, I have pushed the enumeration report update (taking concern of the edge cases of adding/removal of an extension. Where "adding of an enumeration"=="grammar restriction" and "removal of an enumeration"=="grammar extension". Aside of these edge cases a larger enumeration means always an extension and a smallar a grammar restriction. I think I am ready now. :-) I will look through your comments and a nasty cold the past week therefore the delay.

svanteschubert avatar May 05 '23 16:05 svanteschubert

Dear Yoep,

I have rebased this pull request as you updated the sources last week! :-) With this rebased version the sources compile under Linux! 👍

Could you merge them? Or is there something that still could be done?`

From my perspective: The command line worked well when comparing the quite big UN/CEFACT XML grammar version D16B with D22B! Would you like to test it with the GUI? Might be already quite big! However, I was not able to load the UBL 2.1 XML XSD to compare with newer versions of UBL. Receiving an out-of-memory exception. A profiler might give clues, but I am lacking here the time atm...

Have a nice holiday season, Yoep!

Greetings from Germany, Svante

svanteschubert avatar Dec 27 '23 16:12 svanteschubert

Hi Svante,

Thanks for updating the feature branch. Last time I've tried your feature branch, the JavaFX interface didn't want to load, so I'll give it another try this time.

If it works, I'll merge it in. Otherwise, I will also need to find some time to troubleshoot the issue.

Happy holidays for you too :)

Kind regards, yoep

yoep avatar Dec 27 '23 17:12 yoep