Cannot generate Java POJOs with newest versions
I's been a long time since the previous issue about JAXB but we are experiencing problems in binding XML XSDs into Java POJOs. We tried the way is described https://grobid.readthedocs.io/en/latest/TEI-encoding-of-results/#binding-with-jaxb-20 but from version 0.6.0 none of the latest versions have been working properly. Parsing errors appears in all of their compilations.
The stacktraces is as follows:
Current and more recent versions:
ossancm@ossancm:~/Downloads$ git clone https://github.com/kermitt2/grobid.git
Cloning into 'grobid'...
remote: Enumerating objects: 62490, done.
remote: Counting objects: 100% (5240/5240), done.
remote: Compressing objects: 100% (2301/2301), done.
remote: Total 62490 (delta 2534), reused 4805 (delta 2284), pack-reused 57250
Receiving objects: 100% (62490/62490), 1.17 GiB | 10.92 MiB/s, done.
Resolving deltas: 100% (34439/34439), done.
Updating files: 100% (8020/8020), done.
ossancm@ossancm:~/Downloads/grobid$ xjc -d generated -extension grobid-home/schemas/xsd/*.xsd
parsing a schema...
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 385 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 429 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 955 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 2199 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 2228 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 3155 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/Grobid.xsd
Failed to parse a schema.
ossancm@ossancm:~/Downloads/grobid$ git checkout 0.7.0
HEAD is now at 605c6462 Merge pull request #797 from kermitt2/finalize-release-0.7.0
ossancm@ossancm:~/Downloads/grobid$ xjc -d generated -extension grobid-home/schemas/xsd/*.xsd
parsing a schema...
[WARNING] schema_reference.4: Failed to read schema document 'dcr.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
line 3 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/Grobid.xsd
[WARNING] schema_reference.4: Failed to read schema document 'dcr.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
line 3 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/xml.xsd
[ERROR] src-resolve: Cannot resolve the name 'dcr:datcat' to a(n) 'attribute declaration' component.
line 384 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/Grobid.xsd
Failed to parse a schema.
ossancm@ossancm:~/Downloads/grobid$ git checkout 0.6.2
Updating files: 100% (1644/1644), done.
Previous HEAD position was 605c6462 Merge pull request #797 from kermitt2/finalize-release-0.7.0
HEAD is now at 17f2d46c [Gradle Release Plugin] - pre tag commit: '0.6.2'.
ossancm@ossancm:~/Downloads/grobid$ xjc -d generated -extension grobid-home/schemas/xsd/*.xsd
parsing a schema...
[WARNING] schema_reference.4: Failed to read schema document 'dcr.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
line 3 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/Grobid.xsd
[WARNING] schema_reference.4: Failed to read schema document 'dcr.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
line 3 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/xml.xsd
[ERROR] src-resolve: Cannot resolve the name 'dcr:datcat' to a(n) 'attribute declaration' component.
line 384 of file:/home/ossancm/Downloads/grobid/grobid-home/schemas/xsd/Grobid.xsd
Failed to parse a schema.
Previous version working:
ossancm@ossancm:~/Downloads/grobid$ git checkout 0.6.0
Updating files: 100% (7510/7510), done.
Previous HEAD position was 17f2d46c [Gradle Release Plugin] - pre tag commit: '0.6.2'.
HEAD is now at 52479cd7 [Gradle Release Plugin] - pre tag commit: '0.6.0'.
ossancm@ossancm:~/Downloads/grobid$ xjc -d generated -extension grobid-home/schemas/xsd/*.xsd
parsing a schema...
compiling a schema...
org/tei_c/ns/_1/Abstract.java
org/tei_c/ns/_1/AddrLine.java
org/tei_c/ns/_1/Address.java
org/tei_c/ns/_1/Affiliation.java
org/tei_c/ns/_1/Analytic.java
...
The documentation is also out of date https://grobid.readthedocs.io/en/latest/TEI-encoding-of-results/#binding-with-jaxb-20 because XSDs files dont even exist.
Thanx in advance.
Hi @JOscarJ !
Thanks for the issue.
I confess that I am never testing the jaxb binding after updating the XML schemas, because I am not using it (and I don't like java XML bindings but it's not a good reasons :D ).
For the error, I think there is an import issue for the location of dcr.xsd, so you need to launch the command in the directory with the schemas:
cd grobid-home/schemas/xsd
xjc -d generated -extension *.xsd
This solves the error you report, but then there is another error related to the TEI schema:
lopez@work:~/grobid/grobid-home/schemas/xsd$ xjc -d generated -extension *.xsd
parsing a schema...
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 385 of file:/home/lopez/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 429 of file:/home/lopez/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 955 of file:/home/lopez/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 2199 of file:/home/lopez/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 2228 of file:/home/lopez/grobid/grobid-home/schemas/xsd/tei.xsd
[ERROR] cos-nonambig: "http://www.tei-c.org/ns/1.0":formula and "http://www.tei-c.org/ns/1.0":formula (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 3155 of file:/home/lopez/grobid/grobid-home/schemas/xsd/Grobid.xsd
Failed to parse a schema.
Unfortunately, I have no possibility to change the TEI schema and fix these ambiguities, because it is an third party schema. This is the joy of XML.
If XML schema is not a sustainable output format, why not choose a more friendly format like JSON?
Hi @elonzh !
I think XML schema is the "right" output/exchange format for structured full text document (I would say everybody in publishing use it and all the scientific full texts are distributed in XML such as JATS). XML is probably only really good just for this... I don't see any practical problems of representation or sustainability. Although XML is more complicated that JSON, JSON on the other hand raises tons of issues like representing mixed content (inline markups), schema validation, readability, ...
The issue here are the jaxb bindings to create POJO Java object, which is something I personally never liked because it is too fragile with complex XML data. I also think it's a bad idea to create POJO for representing a document like a scientific articles, we can use directly the XML document model which is already specifically designed to represent/model documents.