woodstox
woodstox copied to clipboard
Wrong line for XML event location in elements following DTD
Hello,
While updating our static code analyzer, we realized that in some scenario, the locations corresponding to events are wrong. It seems to be a bug from the woodstox engine.
In order to reproduce the issue, use the following XML file:
<?xml version=\"1.0\"?>
<!DOCTYPE menu [
<!--
Some comment
-->
<!ELEMENT menu (modulo)* >
]>
<menu value=\"foo\"></menu>
Important note: On line 3, there is NO spaces between the comment start and the new line (<!--\n
).
If you add a space between the last -
and the new line (<!-- \n
), locations are then correct.
Code I used to reproduce the issue:
package org.foo;
import java.io.Reader;
import java.io.StringReader;
import javax.xml.stream.Location;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
public class A {
public static void main(String[] args) throws Exception {
String input =
"<?xml version=\"1.0\"?>\n" +
"<!DOCTYPE menu [\n" +
"<!--\n" +
"Some comment\n" +
"-->\n" +
"<!ELEMENT menu (modulo)* >\n" +
"]>\n" +
"<menu value=\"foo\"></menu>";
XMLStreamReader xmlReader = getXmlStreamReader(input);
printLocation(xmlReader);
while (xmlReader.hasNext()) {
xmlReader.next();
printLocation(xmlReader);
}
}
private static XMLStreamReader getXmlStreamReader(String content) throws XMLStreamException {
Reader reader = new StringReader(content);
// Force woodstox factory instance
XMLInputFactory factory = new com.ctc.wstx.stax.WstxInputFactory();
factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
factory.setProperty(XMLInputFactory.IS_VALIDATING, false);
return factory.createXMLStreamReader(reader);
}
private static void printLocation(XMLStreamReader xmlReader) {
System.out.println(toString(xmlReader.getEventType(), xmlReader.getLocation()));
}
private static String toString(int eventType, Location location) {
return "[" + eventType + " - " + eventType(eventType) + "] "
+ "{"
+ "L=" + location.getLineNumber() + "; "
+ "C=" + location.getColumnNumber() + "; "
+ "O=" + location.getCharacterOffset()
+ "}";
}
private static String eventType(int eventType) {
switch (eventType) {
case XMLStreamReader.ENTITY_REFERENCE:
return "ENTITY_REFERENCE";
case XMLStreamReader.COMMENT:
return "COMMENT";
case XMLStreamReader.PROCESSING_INSTRUCTION:
return "PROCESSING_INSTRUCTION";
case XMLStreamReader.CHARACTERS:
return "CHARACTERS";
case XMLStreamReader.START_ELEMENT:
return "START_ELEMENT";
case XMLStreamConstants.END_ELEMENT:
return "END_ELEMENT";
case XMLStreamConstants.CDATA:
return "CDATA";
case XMLStreamConstants.DTD:
return "DTD";
case XMLStreamReader.END_DOCUMENT:
return "END_DOCUMENT";
case XMLStreamReader.START_DOCUMENT:
return "START_DOCUMENT";
default:
return "unknown (" + eventType + ")";
}
}
}
When executing the main
method, the following lines are printed:
[7 - START_DOCUMENT] {L=1; C=1; O=0}
[11 - DTD] {L=2; C=1; O=22}
[1 - START_ELEMENT] {L=7; C=1; O=91}
[2 - END_ELEMENT] {L=7; C=19; O=109}
[8 - END_DOCUMENT] {L=7; C=26; O=116}
At this point, the 3 last event are wrong, while offset is correct. Line should be 8. I would have expected:
[7 - START_DOCUMENT] {L=1; C=1; O=0}
[11 - DTD] {L=2; C=1; O=22}
[1 - START_ELEMENT] {L=8; C=1; O=91}
[2 - END_ELEMENT] {L=8; C=19; O=109}
[8 - END_DOCUMENT] {L=8; C=26; O=116}
Related issue: SONARXML-73
I did not manage to identify the issue in the code, being not that much familiar with the XML API. However, if you give me a hint, I may give a try to fix it.
Edit: If I set property XMLInputFactory.SUPPORT_DTD
to true
instead of false
, then locations are correct. Now, I'm not sure if what it means. Does support of DTD means that they are validated?
Cheers, Michael
Thank you for the report! I suspect there may be an issue with linefeed handling within comments inside DTD; location counters need explicit handling (due to multiple LFs XML allows), and it could be possible there is an update missing. Tracking specific location down may be trickier, but it would be somewhere where comments are skipped (within internal DTD subset, neither comments nor PIs can be reported).
(added "test-needed" label just because automated test would be great -- I realize instructions for writing test are included above)