woodstox icon indicating copy to clipboard operation
woodstox copied to clipboard

Wrong line for XML event location in elements following DTD

Open m-g-sonar opened this issue 5 years ago • 2 comments

Hello,

While updating our static code analyzer, we realized that in some scenario, the locations corresponding to events are wrong. It seems to be a bug from the woodstox engine.

In order to reproduce the issue, use the following XML file:

<?xml version=\"1.0\"?>
<!DOCTYPE menu [
<!--
Some comment
-->
<!ELEMENT menu (modulo)* >
]>
<menu value=\"foo\"></menu>

Important note: On line 3, there is NO spaces between the comment start and the new line (<!--\n). If you add a space between the last - and the new line (<!-- \n), locations are then correct.

Code I used to reproduce the issue:

package org.foo;

import java.io.Reader;
import java.io.StringReader;

import javax.xml.stream.Location;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;

public class A {

  public static void main(String[] args) throws Exception {
    String input =
        "<?xml version=\"1.0\"?>\n" +
        "<!DOCTYPE menu [\n" +
        "<!--\n" +
        "Some comment\n" +
        "-->\n" +
        "<!ELEMENT menu (modulo)* >\n" +
        "]>\n" +
        "<menu value=\"foo\"></menu>";

    XMLStreamReader xmlReader = getXmlStreamReader(input);

    printLocation(xmlReader);
    while (xmlReader.hasNext()) {
      xmlReader.next();
      printLocation(xmlReader);
    }
  }

  private static XMLStreamReader getXmlStreamReader(String content) throws XMLStreamException {
    Reader reader = new StringReader(content);

    // Force woodstox factory instance
    XMLInputFactory factory = new com.ctc.wstx.stax.WstxInputFactory();

    factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
    factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
    factory.setProperty(XMLInputFactory.IS_VALIDATING, false);
    return factory.createXMLStreamReader(reader);
  }

  private static void printLocation(XMLStreamReader xmlReader) {
    System.out.println(toString(xmlReader.getEventType(), xmlReader.getLocation()));
  }

  private static String toString(int eventType, Location location) {
    return "[" + eventType + " - " + eventType(eventType) + "] "
        + "{"
        + "L=" + location.getLineNumber() + "; "
        + "C=" + location.getColumnNumber() + "; "
        + "O=" + location.getCharacterOffset()
        + "}";
  }

  private static String eventType(int eventType) {
    switch (eventType) {
    case XMLStreamReader.ENTITY_REFERENCE:
      return "ENTITY_REFERENCE";
    case XMLStreamReader.COMMENT:
      return "COMMENT";
    case XMLStreamReader.PROCESSING_INSTRUCTION:
      return "PROCESSING_INSTRUCTION";
    case XMLStreamReader.CHARACTERS:
      return "CHARACTERS";
    case XMLStreamReader.START_ELEMENT:
      return "START_ELEMENT";
    case XMLStreamConstants.END_ELEMENT:
      return "END_ELEMENT";
    case XMLStreamConstants.CDATA:
      return "CDATA";
    case XMLStreamConstants.DTD:
      return "DTD";
    case XMLStreamReader.END_DOCUMENT:
      return "END_DOCUMENT";
    case XMLStreamReader.START_DOCUMENT:
      return "START_DOCUMENT";
    default:
      return "unknown (" + eventType + ")";
    }
  }
}

When executing the main method, the following lines are printed:

[7 - START_DOCUMENT] {L=1; C=1; O=0}
[11 - DTD] {L=2; C=1; O=22}
[1 - START_ELEMENT] {L=7; C=1; O=91}
[2 - END_ELEMENT] {L=7; C=19; O=109}
[8 - END_DOCUMENT] {L=7; C=26; O=116}

At this point, the 3 last event are wrong, while offset is correct. Line should be 8. I would have expected:

[7 - START_DOCUMENT] {L=1; C=1; O=0}
[11 - DTD] {L=2; C=1; O=22}
[1 - START_ELEMENT] {L=8; C=1; O=91}
[2 - END_ELEMENT] {L=8; C=19; O=109}
[8 - END_DOCUMENT] {L=8; C=26; O=116}

Related issue: SONARXML-73

I did not manage to identify the issue in the code, being not that much familiar with the XML API. However, if you give me a hint, I may give a try to fix it.

Edit: If I set property XMLInputFactory.SUPPORT_DTD to true instead of false, then locations are correct. Now, I'm not sure if what it means. Does support of DTD means that they are validated?

Cheers, Michael

m-g-sonar avatar Dec 17 '18 17:12 m-g-sonar

Thank you for the report! I suspect there may be an issue with linefeed handling within comments inside DTD; location counters need explicit handling (due to multiple LFs XML allows), and it could be possible there is an update missing. Tracking specific location down may be trickier, but it would be somewhere where comments are skipped (within internal DTD subset, neither comments nor PIs can be reported).

cowtowncoder avatar Dec 17 '18 20:12 cowtowncoder

(added "test-needed" label just because automated test would be great -- I realize instructions for writing test are included above)

cowtowncoder avatar Apr 23 '20 00:04 cowtowncoder