woodstox icon indicating copy to clipboard operation
woodstox copied to clipboard

Notations declared in external DTD subsets are reported as undefined

Open nektarios-kitsios opened this issue 7 months ago • 6 comments

Validation fails with the error below when a NOTATION that is declared in an external DTD subset is referenced in an internal DTD subset:

om.ctc.wstx.exc.WstxValidationException:
1 referenced notation undefined: first one 'IMAGE'
 at [row,col {unknown-source}]: [1,109]
        at com.ctc.wstx.exc.WstxValidationException.create(WstxValidationException.java:50)
        at com.ctc.wstx.sr.StreamScanner.reportValidationProblem(StreamScanner.java:593)
        at com.ctc.wstx.sr.StreamScanner.reportValidationProblem(StreamScanner.java:601)
        at com.ctc.wstx.dtd.FullDTDReader._reportVCViolation(FullDTDReader.java:1989)
        at com.ctc.wstx.dtd.FullDTDReader._reportUndefinedNotationRefs(FullDTDReader.java:1967)
        at com.ctc.wstx.dtd.FullDTDReader.parseDTD(FullDTDReader.java:639)
        at com.ctc.wstx.dtd.FullDTDReader.readInternalSubset(FullDTDReader.java:427)
        at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:277)
        at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3466)
        at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2089)
        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1180)
        at org.codehaus.stax.test.vstream.TestExternalSubset.testNotationReferenceInInternalSubset(TestExternalSubset.java:77)

The following example demonstrates this problem:

XML:

<!DOCTYPE root SYSTEM 'test.dtd' [ <!ENTITY gr2 SYSTEM "gr2" NDATA IMAGE> ]>
<root>&extEnt;</root> 

DTD:

<!ELEMENT root (#PCDATA)>
<!ENTITY extEnt 'just testing'>
<!NOTATION IMAGE       PUBLIC "-//AS//NOTATION image format//EN"
                     "http://www.test.com/xml/common/dtd/notation/image">

Note that Xerces and xmllint validate the above XML successfully. This is in accordance with the XML specification which does not impose any specific order in the declarations. The only requirement is the following:

If both the external and internal subsets are used, the internal subset must be considered to occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.

The problem seems to be that woodstox reads the internal subset before the external one, and reports the error at the time the internal subset is read when the notation declaration has not been read yet.

For the record there used to be an old issue reporting this problem here: https://web.archive.org/web/20150507153747/http://jira.codehaus.org/browse/WSTX-264

nektarios-kitsios avatar Nov 24 '23 18:11 nektarios-kitsios